| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ww520 1807 days ago
	It really comes down to the latency and throughput requirements. Projects are architected differently for different latency expectation and different throughput expectation. In event processing there's a continuum of expected latencies from batch processing to realtime. Batch processing is typically running reports over large volume of events (good for throughput). Hadoop is a good example. On the other end, sub-second realtime report is possible with Heron/Storm. Spark is kind of in the middle with hybrid mini-batching. Reportedly Twitter has used Heron/Storm to track word counts in all the tweets to find trending topics, where the latency between a new tweet coming in to the word counts updated over the whole network is in 100s milliseconds.