| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fabianlindfors 390 days ago

> What this project is doing allows one to represent each row (tuple) as a FDB key/value, giving you finer-grained control. But it comes at a cost of having to implement all the access methods, including index scans, in terms of FDB.

That's absolutely right! mvsqlite is a cool project but it doesn't make good use of FoundationDB in a way, which I find a bit unfortunate what a nice piece of software it is!

> This is presumably why database metadata (DDL) isn't currently persisted, because those structures aren't normal tables.

Yes, although perhaps a fun fact that all database metadata in Postgres is actually stored as standard tables, the so called system catalog: https://www.postgresql.org/docs/current/catalogs.html.

I'm still mulling over how to implement DDL persistence but one possible way would be to change the actual system catalog tables to be backed by FDB instead, and rely on some cache on the nodes to avoid round tripping to FDB to get metadata for each query.

As you can imagine though the system catalog is quite deeply intertwined with Postgres as a whole so remains to be seen if this is even doable. The alternative would be a more complicated design where the data is stored in some custom format in FDB and then synced by each node into the system catalog.

1 comments

atombender 388 days ago

That sounds complicated. Presumably you could use watches to quickly get notified on catalog changes, but it suggests that any node could potentially have outdated metadata leading to weird and subtle consistency issues.

With some things like statistics, maybe not a big problem, but with things like dropping columns from tables, you could end up in a situation where a query sees the old table definition.

link

fabianlindfors 387 days ago

Certainly complicated! This is one of those places where I don't think Postgres compatibility is achievable or even desirable, schema changes in a distributed databases are a whole different beast (and very complicated in large Postgres deployments as well).

It will probably end up being something similar to online schema changes in Cockroach where the changes that can be made are limited for safety and run as a background job that can be tracked: https://www.cockroachlabs.com/docs/stable/online-schema-chan...

I've got a bit of experience with schema changes from building Reshape: https://github.com/fabianlindfors/reshape. I'm hoping to transfer over some concepts from there into pgfdb eventually!

link