Hacker News new | ask | show | jobs
by cmroanirgo 2777 days ago
Hmmm. I used to be part of a team that handled market data at crazy rates and we took exactly the opposite approach to these guys.

When I see:

"You Can Lose a Few Datapoints Here and There"

I see that these guys are barking the wrong tree.

1. We used single thread per network card. (Yes, we architected clusters/failovers, etc... but not once was it required because of data rates)

2. The server could handle a fully saturated Gibit network at <50% CPU (per core)

3. Data was NEVER thrown away (but we had allowances in our API to let the client reading the data to drop updates and get sub-second aggregates instead -- eg OHLC or summation)

4. Data was stored in basically flat file systems.

5. Our calculation engine was run 'downstream' toward the client ends, or on the client end, away from data collection. If needed (ie. the calcs were expensive to run), these could feed back into the server for long term storage.

This was mid 2000. I'm sure this is not rocket science for modern day timeseries guys.

2 comments

Yeah, it's still pretty much the same just at 10 or 40 gbit now.

Hardware capture almost never drops and timestamps with GPS sync.

You can then take those capture files and manipulate them however you want into normalized market data.

Market data has the notable feature of being segmented by trading day, so the combination of symbol-venue-date is an appropriately small unit of data to run aggregations of any kind over or to distribute over a cluster.

So for market data at least, there's not much to "rolling your own" time series DB in Python or what-have-you.

Prcessing that firehouse in real time for trading is a different matter though and how you build that depends heavily on your latency requirements.

Right. For those interested OpenHFT has created a really nice set of open source solutions to do this.

https://github.com/OpenHFT/Chronicle-Queue#design

Do you know any article or book outlining the architecture of a full HFT system, I.e. from market data consumption to pricing to trading? Thanks in advance!