| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wwilson 3402 days ago

Yes, this is the argument that VoltDB made as well:

https://www.voltdb.com/blog/foundationdbs-lesson-fast-key-va...

My feelings on this topic are mixed. On the one hand, I think many of the specific examples chosen in that post are false (and have told John as much in person). On the other hand, the general point that you can squeeze out constant factor performance improvements by violating abstraction boundaries is obviously usually true.

Nevertheless, I still think this is a bad argument. While it's true that abstractions are rarely costless, they can often be made so cheap that the low-hanging performance fruit is elsewhere. And in particular, cheap enough that they're worth it when you consider all the other benefits that they bring.

When I built a query language and optimizer on top of FoundationDB, my inability to push type information down into the storage engine was about the last thing on my mind. Perhaps someday when I'd made everything else perfect it would've become a big pain (and perhaps someday we would've provided more mechanisms for piercing the various abstractions and providing workload hints), but in the meantime partitioning off all state in the system into a dedicated component that handled it extremely well made the combined distributed systems problem massively more tractable. The increased developer velocity and reduced bugginess in turn meant that I (as a user of the key-value store) could spend scarce engineering resources on other sorts of performance improvements that more than compensated for the theoretical overhead imposed by the abstraction.

I won't claim that a transactional ordered key-value store is the perfect database abstraction for every situation, but it's one that I've found myself missing a great deal since leaving Apple.

But I'm glad to hear that things are going well for you guys. Best of luck, this is a brutal business!

3 comments

jhugg 3402 days ago

Hi Will. Thanks for the shout-out.

I still think many of the arguments in that blog post hold up for non-embedded KV stores. I think you can mitigate a lot by aggressively caching metadata, but eventually you end up moving the SQL engine closer and closer to the storage layer to get performance. And yeah, you end up more monolithic and testing gets harder. Sigh.

Some of this is workload dependent. If you're not touching many rows in your queries and transactions, then you can get away with a lot more. But if you give someone SQL, they're going to want to scan.

I wouldn't mind being proven wrong. Maybe Apple made FDB run SQL at legit speeds. I haven't seen much from public projects that work this way to change my mind yet.

> I won't claim that a transactional ordered key-value store is the perfect database abstraction for every situation, but it's one that I've found myself missing a great deal since leaving Apple.

How does Spanner not satisfy that itch? Not ordered matters?

link

wwilson 3402 days ago

> How does Spanner not satisfy that itch? Not ordered matters?

I was probably unclear in my previous comment. Spanner is great! (And Spanner is ordered). The particular aspect of FDB that I miss is what some of our old customers called "the bottom half of a database" or "a database construction kit". In fact FDB was an awesome modular building block for all kinds of distributed systems, not just databases. We hacked up prototypes for a whole bunch of these but sadly never got around to releasing them.

Spanner is a full-fledged enterprise grade database with opinions about your data model, query language, types, etc. For the vast majority of customers, that's much more useful than what FDB provided. But for me as somebody who enjoys kicking around silly new ideas for distributed systems, it's a bit less fun.

link

abdullin 3402 days ago

Monolithic databases (CockroachDB, Spanner etc) don't eliminate abstraction boundaries on the larger scale. They are simply aligned with the database boundary, pushing the burden of crossing them to the application logic. Software will still have to pay the price of crossing the gap with code complexity and performance (object-relational impedance mismatch comes to my mind).

It feels like the building block approach lets you achieve better design and performance for your entire application in the long run. Especially, if you can treat building blocks as blueprints and modify them to fit the task at hand.

link

evanweaver 3402 days ago

"When I built a query language and optimizer on top of FoundationDB, my inability to push type information down into the storage engine was about the last thing on my mind."

What was on your mind? What performance problems did you encounter?

link