Re: Optimizing basic operations
- Posted by gimlet Aug 16, 2013
- 4713 views
James,
The talk has mostly been about the speed of processing the data. You need to consider first of all the upload time. Given 10^12 records with a evaluuation integer and 50 numbers you have a very large amount to load. say 200 Tb. According to Tom's hardware the fastest off the shelf and affordable SSDs have data transfer rates approximating 0.5 Gb per second. That is to say you would be looking at an upload time of 400,000 seconds or so. That is 111 hours or 4.5 days. The processing of the records (assuming random data) could be expected to be 40,000 seconds (based on Derek's numbers). That is to say 11 hours. Multiplying the number of machines used is much more effective than using a faster processor. Eg 10 machines would get the load time down to 11 hours and the processing down to 1.1 hours. Cutting down the amount of data would also help (but, of course you still need to have the identifying data together with the evaluation).