Hacker News new | ask | show | jobs
by nathants 883 days ago
the idea of map reduce remains a good one.

there are a number of interesting innovations in streaming systems that followed, mostly around reducing latency, reducing batch size, and failure strategies.

even hadoop could be hard to debug when hitting a performance ceiling for challenging workloads. the streaming systems took this even further, spark being notorious for fiddle with knobs and pray the next job doesn’t fail after a few hours, again.

i played around with the thinnest possible distributed data stack a while back[1][2]. i wanted to understand the performance ceiling for different workloads without all the impenetrable layers of software bureaucracy. turns out modern network and cpu are really fast when you stop adding random layers like lasagna.

i think the future of data, for serious workloads, is gonna be bespoke. the primitives are just too good now, and the tradeoff for understandability is often worth the cost.

1. https://github.com/nathants/s4

2. https://github.com/nathants/bsv