| Yes, this is the argument that VoltDB made as well: https://www.voltdb.com/blog/foundationdbs-lesson-fast-key-va... My feelings on this topic are mixed. On the one hand, I think many of the specific examples chosen in that post are false (and have told John as much in person). On the other hand, the general point that you can squeeze out constant factor performance improvements by violating abstraction boundaries is obviously usually true. Nevertheless, I still think this is a bad argument. While it's true that abstractions are rarely costless, they can often be made so cheap that the low-hanging performance fruit is elsewhere. And in particular, cheap enough that they're worth it when you consider all the other benefits that they bring. When I built a query language and optimizer on top of FoundationDB, my inability to push type information down into the storage engine was about the last thing on my mind. Perhaps someday when I'd made everything else perfect it would've become a big pain (and perhaps someday we would've provided more mechanisms for piercing the various abstractions and providing workload hints), but in the meantime partitioning off all state in the system into a dedicated component that handled it extremely well made the combined distributed systems problem massively more tractable. The increased developer velocity and reduced bugginess in turn meant that I (as a user of the key-value store) could spend scarce engineering resources on other sorts of performance improvements that more than compensated for the theoretical overhead imposed by the abstraction. I won't claim that a transactional ordered key-value store is the perfect database abstraction for every situation, but it's one that I've found myself missing a great deal since leaving Apple. But I'm glad to hear that things are going well for you guys. Best of luck, this is a brutal business! |
I still think many of the arguments in that blog post hold up for non-embedded KV stores. I think you can mitigate a lot by aggressively caching metadata, but eventually you end up moving the SQL engine closer and closer to the storage layer to get performance. And yeah, you end up more monolithic and testing gets harder. Sigh.
Some of this is workload dependent. If you're not touching many rows in your queries and transactions, then you can get away with a lot more. But if you give someone SQL, they're going to want to scan.
I wouldn't mind being proven wrong. Maybe Apple made FDB run SQL at legit speeds. I haven't seen much from public projects that work this way to change my mind yet.
> I won't claim that a transactional ordered key-value store is the perfect database abstraction for every situation, but it's one that I've found myself missing a great deal since leaving Apple.
How does Spanner not satisfy that itch? Not ordered matters?