| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dangoldin 3377 days ago
	Yea - it does seem a bit high. We use Spark for our adtech data pipeline and we're handling tens of billions of events a day in less time. It may be a function of how much data they're pulling in from other systems or dumping the data back into a variety of systems. Spark itself is parallelizable so in theory can be sped up just by running more nodes.

1 comments

sheeshkebab 3376 days ago

financial processing is typically sequential - can't calculate some metric until some other thing was calculated (or pulled data for)... not well parallelizable in other words. or so it is with some systems I deal with.

link