Hacker News new | ask | show | jobs
by mjpt777 4987 days ago
Interesting. It sounds like your issues are IO dominant since you do not mind the JVM startup cost from Hadoop for each query on each node. I'm more often looking at large data that is all memory resident which tends to drive the design this way. In finance queries need to have latencies way below sub-second which Hadoop cannot come close to satisfying. This is comparing batch to real-time analytics.
1 comments

You're right that most of my big-data experience is batch work, and outside of finance. I guess I'm finding it hard to envision the kind of data where you'd want to work on the whole set, but that set's small enough to fit into memory - for real-time analytics wouldn't you be wanting to stream data and reduce it to the representation you want as it comes in?