| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amw-zero 398 days ago
	> From what I can tell, this doesn't replace the lower level page heap storage, but instead actually provides new implementation of table and indexes These seem contradictory. If the data is stored in FoundationDB, then it won't be stored in the filesystem as blocks, right?

1 comments

atombender 397 days ago

Correct, my point is that it replaces storage at a higher level.

For example, mvsqlite implements a SQLite VFS that maps page to FoundationDB keys. This means that once the VFS is in place, everything is in FDB. But this means that you have to deal with page conflicts if you want multiple writers to mount the same database, so it's a somewhat coarse-grained system.

What this project is doing allows one to represent each row (tuple) as a FDB key/value, giving you finer-grained control. But it comes at a cost of having to implement all the access methods, including index scans, in terms of FDB.

This is presumably why database metadata (DDL) isn't currently persisted, because those structures aren't normal tables.

link

fabianlindfors 397 days ago

> What this project is doing allows one to represent each row (tuple) as a FDB key/value, giving you finer-grained control. But it comes at a cost of having to implement all the access methods, including index scans, in terms of FDB.

That's absolutely right! mvsqlite is a cool project but it doesn't make good use of FoundationDB in a way, which I find a bit unfortunate what a nice piece of software it is!

> This is presumably why database metadata (DDL) isn't currently persisted, because those structures aren't normal tables.

Yes, although perhaps a fun fact that all database metadata in Postgres is actually stored as standard tables, the so called system catalog: https://www.postgresql.org/docs/current/catalogs.html.

I'm still mulling over how to implement DDL persistence but one possible way would be to change the actual system catalog tables to be backed by FDB instead, and rely on some cache on the nodes to avoid round tripping to FDB to get metadata for each query.

As you can imagine though the system catalog is quite deeply intertwined with Postgres as a whole so remains to be seen if this is even doable. The alternative would be a more complicated design where the data is stored in some custom format in FDB and then synced by each node into the system catalog.

link

atombender 394 days ago

That sounds complicated. Presumably you could use watches to quickly get notified on catalog changes, but it suggests that any node could potentially have outdated metadata leading to weird and subtle consistency issues.

With some things like statistics, maybe not a big problem, but with things like dropping columns from tables, you could end up in a situation where a query sees the old table definition.

link

fabianlindfors 394 days ago

Certainly complicated! This is one of those places where I don't think Postgres compatibility is achievable or even desirable, schema changes in a distributed databases are a whole different beast (and very complicated in large Postgres deployments as well).

It will probably end up being something similar to online schema changes in Cockroach where the changes that can be made are limited for safety and run as a background job that can be tracked: https://www.cockroachlabs.com/docs/stable/online-schema-chan...

I've got a bit of experience with schema changes from building Reshape: https://github.com/fabianlindfors/reshape. I'm hoping to transfer over some concepts from there into pgfdb eventually!

link

amw-zero 395 days ago

> this doesn't replace the lower level page heap storage

So this is wrong. The heap storage is replaced.

link

atombender 394 days ago

You miss my point. It's not replacing Postgres' entire block storage with FoundationDB, it's injecting Table Access Methods and Index Access Methods handlers to override tuple storage at the table and index level. These handlers deal with tuples, not blocks.

The author replied to this thread and confirmed this.

link