Hacker News new | ask | show | jobs
by stevefan1999 457 days ago
I sort of disagree. KV databases are so fundamental, that they are considered one of the most foundational tech to any advanced database management system.

Think of KV databases as a persistent associative mapping/hash map that needs to store data in a safe and secure way, then we can build advanced stuff on top of it. Take TiDB for example, it is a distributed database based on MySQL (its own query language can be considered as a subset of MySQL), but actually most of the heavylifting is handled by TiKV, which is a distributed KV datastore with Raft distributed consensus.

And then SurrealDB also leveraged TiKV to build their own graph-document hybrid database product...as one of the data transport. P.S.: used to be a contributor for SurrealDB.

2 comments

KV databases are also the least efficient architecture possible if your data models or workloads are non-trivial. They are relatively simple to design and build, which is a positive attribute, but they are not that capable in any kind of theoretical sense. Other architectures preserve far more spatial and temporal locality when representing data models.

If your workload has even a whiff of analytics to it, operational or slow-time, KV databases are almost the pathological architecture in theory. Their intrinsically poor locality exacts a steep performance price.

These database architectures are all equivalent in the same sense that almost everything is a Turing Machine. Some manifestations and implementations are much more efficient than others in the real world. While I am not as emotionally invested in it as the article’s author seems to be, he is generally correct that KV databases have poor properties for most applications.

Deepseek just used FoundationDB to build a parallel filesystem. Parallel filesystem are a big deal -- their number, including proprietary ones, is probably in single digits.
Parallel Filesystems aren't a new or novel concept, and there have been lots of implementations.

The first one I encountered was DrFTPD circa 2004. But these days, any object storage system qualifies because they all support varying replication schemes and reading from any valid in-sync replica.

Now we are getting into the definition of what a parallel filesystem is.

In my book, a parallel filesystem is not just pooling together a bunch of nodes, but something that can actually support the synchronized accesses needed by a parallel workload. So not just decoupling between data and metadata, but scaling out of the metadata layer as well.

That and a hierarchical namespace (I could be sold on compromising some POSIX compliance for performance reasons, but it has to fundamentally be a hierarchical namespace with similar semantics). So object stores would not qualify.

Object stores are not filesystems. They have a paths, but they are not hierarchical.
unless it's a hierarchical object store, like the one i am using.
thinking more about this, the difference between a hierarchical and non-hierarchical storage i the existence of additional indexes representing the hierarchy.

in most filesystems files are addressed by an inode. and directories are just lists of inodes. if you remove those then you end up with a KV store: inode->file.

consequently i see no difficulty to convert a KV store into a hierarchy by adding the necessary tables/directories representing the hierarchy.

In a hierarchically system renaming a directory is an O(1) operation, in a non-hierarchical system with paths it is an O(n) operation.