Hacker News new | ask | show | jobs
by d--b 3388 days ago
Is it really that big that we need to talk about "scaling" it? 4-5 hours of processing time is a lot... I mean how many transactions would they do on a normal day?
1 comments

Yea - it does seem a bit high. We use Spark for our adtech data pipeline and we're handling tens of billions of events a day in less time. It may be a function of how much data they're pulling in from other systems or dumping the data back into a variety of systems. Spark itself is parallelizable so in theory can be sped up just by running more nodes.
financial processing is typically sequential - can't calculate some metric until some other thing was calculated (or pulled data for)... not well parallelizable in other words. or so it is with some systems I deal with.