| The cost of market data and a system equipped to process it are not inconsiderable. Historical tick data (quotes and trades) for US equities will run you $20k/year, minimum. US equities runs about 30-40GB/day so you are looking at almost 10TB/year. You need to be able to access the data quickly so you probably don't want to compress it and probably do want to duplicate the data, slicing and dicing it in different ways. Call it 20TB, minimum, per year of data. Futures and options data will run another $20k/year, each, and futures has about as much data as equities and options has an order of magnitude more. That is another 100TB/year, minimum, per year, probably closer to 200TB. How are going to to study these data? Even if you were able to operate at disk speed (which seems unlikely unless you have lots of disks and have duplicated and distributed your data amongst them) it would take you five minutes to run against one day's worth of data, or close to a day to run a year's simulation. If you consider more than a few cores in parallel, reading from disk is going to be seriously hampered. None of this even considers the cost of the software necessary to support this effort: contending with corporate actions, tracking halts, etc. This all adds up to a significant financial hurdle. |