| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by russell_sears 3616 days ago

Hi. First author of the bLSM (Yahoo) paper here. You've pretty much nailed it: The trick is getting latency and volume at the same time. If you're willing to wait an hour to look at event logs, then Hadoop is good enough, but where's the fun in that?

At the time, we were looking for the best of both worlds: Hadoop and log processing give high throughput writes, but latencies are much too high to act on user intent within a single session. We had a lot of applications in mind that needed low latency access to event log data, but didn't have a suitable storage engine to back them.

Existing LSM trees suffered from write stalls and excessive read amplification. We were targeting hard disks back then, and didn't want to pay for the extra seeks. We got random read access down to about one seek per read, and figured out how to eliminate write stalls. Surprisingly, we still ended up with good write throughput.

More importantly, from an application perspective, we provide efficient range scans, which lets you reason about groups of data with matching or contiguous keys. If you squint hard enough, this is all MapReduce really does, except it precomputes all the answers up front, and we do it in real time. On the one hand we have lower throughput, since we perform more random reads / sequential writes, and also do more thread synchronization. On the other, we only perform the computations for data that's actually being used for something, which means we do a lot less work for the right applications.

Thanks!