Hacker News new | ask | show | jobs
by tabtab 2460 days ago
I'd like to see "dynamic relational" implemented. It's conceptually very similar to existing RDBMS and can use SQL (with some minor variations for comparing more explicitly). You don't have to throw away your RDBMS experience and start over.

And you can incrementally "lock it down" so that you get RDBMS-like protections when projects mature. For example, you may add required-field constraints (non-blank) and type constraints (must be parsable as a number, for instance). Thus, it's good for prototyping and gradually migrating to production. It may not be as fast as an RDBMS for large datasets, though. But that's often the price for dynamicness. (A fancy version could allow migrating or integrating tables to/with a static system, but let's walk before we run.)

https://stackoverflow.com/questions/66385/dynamic-database-s...

Some smaller university out there can make a name for themselves by implementing it. I've been kicking around doing it myself, but I'd have to retire first.

3 comments

I've done this (commercially!) with PostgreSQL - just start with a single table, with one JSON field, and as you want performance, integrity, etc, add expression indexes, break out frequently used expressions into columns etc. On large tables, obviously there's a cost for this reorganization but you can partition the data first, and only reorg the most recent data (e.g. range partitioning by time).

https://www.google.com/search?q=expression+index+postgres

https://www.postgresql.org/docs/10/ddl-partitioning.html

I am trying this out and I am still on the edge of whether I like it or not.

Create a table with a json column:

  CREATE TABLE Doc (
    id UUID PRIMARY KEY,
    val JSONB NOT NULL
  );
Then later it turns out all documents have user_ids so you add a check constraint and an index:

  ALTER TABLE Doc ADD CONSTRAINT check_doc_val CHECK (
    jsonb_typeof(val)='object' AND
    val ? 'user_id' AND
    jsonb_typeof(val->'user_id')='string'
  );
  CREATE INDEX doc_user_id ON Doc ((val->>'user_id'));
I think the postgres syntax for this is pretty ugly. And if you also want foreign key constraints you still have to move that part of the json out as a real column (or duplicate it as a column on Doc). I am not sure it's even worth it to have postgres check these constraints (vs just checking them in code).

I am also a little worried about performance (maybe prematurely). If that document is large, you will be rewriting the entire json blob each time you modify anything in it. A properly normalized schema can get away with a lot less rewriting?

At this point, I would just bite the bullet, break out a new user_id column and run a migration to populate the column.
Just a note about using uuid as a primary key. Typically you will use a b-tree index, which likes to keep things sorted. So something like a serial number works best, because it is already sorted and will be appended at the end. Otherwise inserting a new column will cause traversal the b-tree all over the place which will hurt performance it you do a lot of inserts.

If you really want to use uuid and care about performance you might prefix it with something that's increasing like a date, or perhaps (did not try it) use hash index (need to be PG 10+).

(We're getting way off topic) but I think the problem with auto increment is that it can't be sharded easily since multiple shards can increment to the same value. If you then try to go back to random ids you're now stuck with 8 bytes which will conflict once every billion items or so. I guess it's pretty extreme premature optimization but I think UUID is nicer for future-proofing at the cost of some performance. (I would love to see benchmarks to know exactly how much performance I am giving up though)

By the way uuidv1 is already prefixed by a timestamp! But unfortunately it doesn't use a sortable version of the time so it doesn't work for clustering the ids into the same page. I think it was really designed for distributed systems where you would want evenly distributed ids anyway.

> I would love to see benchmarks to know exactly how much performance I am giving up though

https://www.youtube.com/watch?v=xrMbzHdPLKM

It ends up being a pitch for Aurora at the end (as with any presentation from AWS folks), but it has tons of useful information for standard Postgres.

In MySQL/MariaDB/Percona InnoDB Galera every writeable replica has an auto increment offset.
Same in postgres[1], and I'm willing to guess every relational database has a way to do it.

[1] https://www.postgresql.org/docs/current/sql-createsequence.h...

It seems to treat "direct" columns different than JSON-ized columns. That bifurcation creates two ways of doing things. Ideally, it should be seamless.
Do you have any resources online for how well this approach works in practice? I've been thinking about doing this in replacing a MongoDB database.
Anecdotally, it's worked out well. Before storing a JSON object, we validate it at the app-level using a JSON schema (which has types, required keys, etc). This lets us write without worrying to much.

Once we felt that the schema wasn't changing as much, had too many concerns, etc, we made tables and wrote to them (instead of the JSON column).

Could you please elaborate on why do you want to replace MongoDB?
Have you seen RedBean ORM? https://www.redbeanphp.com/index.php
I don't really want an automatically generated static RDBMS, I'd like a dynamic RDBMS. For one, one can use it with any programming language, not just PHP.
You can create indices in MongoDB on JSON, works pretty well.

And since 4.x they have transactions over shards too!

But the "interface" to the JSON data is different than regular columns, and arguably awkward. The dynamic relational assumption is that one can use regular SQL just like with regular "static" tables (with some minor differences to make compare-type more explicit).
I find SQL much more awkward than proper language-idiomatic APIs.

Yes, SQL shines when you can whip up a big multi-table join, with nice group by-s and wheres, havings and whatnots, but that's rare. Though, arguably, ORMs solved most of the crazy string concatenation query crafting craziness.