| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jhugg 3161 days ago

As I mentioned in a previous comment, you can fix some of the problems; caching metadata is a great example.

The real issue is not whether abstractions are good, but at what layer and how pure/leaky they should be.

If I build a SQL engine on top of RocksDB, I still need a way to scan a bunch of tuples and apply a predicate. It's probably faster if RocksDB lets me hand over a predicate and returns an iterator of matching tuples than if I have to iterate on top of rocks DB. Maybe this difference is large -- maybe not. It depends on a lot of details. Certainly a custom storage layer turned to apply predicates fast is substantially faster.

If I build a SQL engine on top of a distributed KV store, then I really want to push the predicate scan down to the individual nodes, and I probably still want to push the predicate down even lower. For most queries, I also want to have understanding of how data is partitioned.

You can do all of this, but the abstraction gets leakier and leakier as you start to get reasonable performance. At the time, the FDB SQL layer didn't seem to do any of this. Maybe not at Apple it is much smarter and more intertwined.

The planner issue you mention is real, but I'm slightly more optimistic that engineers are willing to identify slow queries and figure out how to adapt them to the new system if the rewards are clear.

N.B. If you're using SQL for KV gets/puts, or if you're joining one row to a handful of others by primary key (e.g. lookup order items in order), then this stuff doesn't matter much. But if you give someone a SQL layer, odds are they'll want to run a query sooner or later, even an OLTP-ish one.

To address the "completely speculative "post-mortem" by someone who knew nothing about the technology" bit: I was only talking about the FDB SQL layer performance and design, much of which was public at the time of the acquisition.