| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tabtab 2460 days ago

I'd like to see "dynamic relational" implemented. It's conceptually very similar to existing RDBMS and can use SQL (with some minor variations for comparing more explicitly). You don't have to throw away your RDBMS experience and start over.

And you can incrementally "lock it down" so that you get RDBMS-like protections when projects mature. For example, you may add required-field constraints (non-blank) and type constraints (must be parsable as a number, for instance). Thus, it's good for prototyping and gradually migrating to production. It may not be as fast as an RDBMS for large datasets, though. But that's often the price for dynamicness. (A fancy version could allow migrating or integrating tables to/with a static system, but let's walk before we run.)

https://stackoverflow.com/questions/66385/dynamic-database-s...

Some smaller university out there can make a name for themselves by implementing it. I've been kicking around doing it myself, but I'd have to retire first.

3 comments

asah 2460 days ago

I've done this (commercially!) with PostgreSQL - just start with a single table, with one JSON field, and as you want performance, integrity, etc, add expression indexes, break out frequently used expressions into columns etc. On large tables, obviously there's a cost for this reorganization but you can partition the data first, and only reorg the most recent data (e.g. range partitioning by time).

https://www.google.com/search?q=expression+index+postgres

https://www.postgresql.org/docs/10/ddl-partitioning.html

zawerf 2460 days ago

I am trying this out and I am still on the edge of whether I like it or not.

Create a table with a json column:

  CREATE TABLE Doc (
    id UUID PRIMARY KEY,
    val JSONB NOT NULL
  );

Then later it turns out all documents have user_ids so you add a check constraint and an index:

  ALTER TABLE Doc ADD CONSTRAINT check_doc_val CHECK (
    jsonb_typeof(val)='object' AND
    val ? 'user_id' AND
    jsonb_typeof(val->'user_id')='string'
  );
  CREATE INDEX doc_user_id ON Doc ((val->>'user_id'));

I think the postgres syntax for this is pretty ugly. And if you also want foreign key constraints you still have to move that part of the json out as a real column (or duplicate it as a column on Doc). I am not sure it's even worth it to have postgres check these constraints (vs just checking them in code).

I am also a little worried about performance (maybe prematurely). If that document is large, you will be rewriting the entire json blob each time you modify anything in it. A properly normalized schema can get away with a lot less rewriting?

jayd16 2460 days ago

At this point, I would just bite the bullet, break out a new user_id column and run a migration to populate the column.

takeda 2460 days ago

Just a note about using uuid as a primary key. Typically you will use a b-tree index, which likes to keep things sorted. So something like a serial number works best, because it is already sorted and will be appended at the end. Otherwise inserting a new column will cause traversal the b-tree all over the place which will hurt performance it you do a lot of inserts.

If you really want to use uuid and care about performance you might prefix it with something that's increasing like a date, or perhaps (did not try it) use hash index (need to be PG 10+).

zawerf 2460 days ago

(We're getting way off topic) but I think the problem with auto increment is that it can't be sharded easily since multiple shards can increment to the same value. If you then try to go back to random ids you're now stuck with 8 bytes which will conflict once every billion items or so. I guess it's pretty extreme premature optimization but I think UUID is nicer for future-proofing at the cost of some performance. (I would love to see benchmarks to know exactly how much performance I am giving up though)

By the way uuidv1 is already prefixed by a timestamp! But unfortunately it doesn't use a sortable version of the time so it doesn't work for clustering the ids into the same page. I think it was really designed for distributed systems where you would want evenly distributed ids anyway.

takeda 2459 days ago

> I would love to see benchmarks to know exactly how much performance I am giving up though

https://www.youtube.com/watch?v=xrMbzHdPLKM

It ends up being a pitch for Aurora at the end (as with any presentation from AWS folks), but it has tons of useful information for standard Postgres.

pas 2460 days ago

In MySQL/MariaDB/Percona InnoDB Galera every writeable replica has an auto increment offset.

takeda 2459 days ago

Same in postgres[1], and I'm willing to guess every relational database has a way to do it.

[1] https://www.postgresql.org/docs/current/sql-createsequence.h...

tabtab 2460 days ago

It seems to treat "direct" columns different than JSON-ized columns. That bifurcation creates two ways of doing things. Ideally, it should be seamless.

ucarion 2460 days ago

Do you have any resources online for how well this approach works in practice? I've been thinking about doing this in replacing a MongoDB database.

cagmz 2460 days ago

Anecdotally, it's worked out well. Before storing a JSON object, we validate it at the app-level using a JSON schema (which has types, required keys, etc). This lets us write without worrying to much.

Once we felt that the schema wasn't changing as much, had too many concerns, etc, we made tables and wrote to them (instead of the JSON column).

pas 2460 days ago

Could you please elaborate on why do you want to replace MongoDB?

thunderbong 2460 days ago

Have you seen RedBean ORM? https://www.redbeanphp.com/index.php

tabtab 2460 days ago

I don't really want an automatically generated static RDBMS, I'd like a dynamic RDBMS. For one, one can use it with any programming language, not just PHP.

pas 2460 days ago

You can create indices in MongoDB on JSON, works pretty well.

And since 4.x they have transactions over shards too!

tabtab 2457 days ago

But the "interface" to the JSON data is different than regular columns, and arguably awkward. The dynamic relational assumption is that one can use regular SQL just like with regular "static" tables (with some minor differences to make compare-type more explicit).

pas 2452 days ago

I find SQL much more awkward than proper language-idiomatic APIs.

Yes, SQL shines when you can whip up a big multi-table join, with nice group by-s and wheres, havings and whatnots, but that's rare. Though, arguably, ORMs solved most of the crazy string concatenation query crafting craziness.