| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cerved 1858 days ago

> Okay, bad example. I should have specified: Insert into an existing table, add the missing columns automatically. There are some special-cases where most database engines can automatically generate schemas, and "SELECT... INTO" is one of those few. Usually only allowed with an empty destination table.

See, I don't think I would want SQL that would modify the schema programatically. And imho, that sounds like an invitation for trouble. It can be done of-course, but, to me, the value of the relational data model is to enforce consistency through constraint. If I don't care about consistency, I may as well just dump the data as schemaless JSON and then periodically turn that semi-structured data into actual relational data.

Note that I said schemad database object.

Normalizing the data model is too complex to do programmatically and sounds ill advised. If you want the advantage of painless schemaless data, dump it as JSON.

> NoSQL in a nutshell! You haven't solved the problem, you've given up and resorted to schemaless tables instead, exactly the "bad" example above with all of its limitations and issues.

Note that I was meant this as a stop-gap solution, which to build on and turn into schema-controlled data. Preferably iteratively and hopefully not in production.

I've done this when I've had semi-structured data that I wanted to dump into the RDBMS at which point there was no schema. But it's a band aid. That data is in my experience best turned into schema controlled data ASAP, otherwise inconsistencies start to creep in. Relations that were not immediately obvious pop-up and I don't think there's hope to let the SQL database carry out such a task programmatically.

There are already tools do a good job in analyzing data, which aid the construction of schemas. There are also tools that make updating them relatively painless, especially if you've already inserted the semi-structured data, as most RDBMS' have decent support for handling JSON these days.

Then again, I'm not sure I'm able to appreciate the language you dream of. To me, it sounds like SQL but where ALTER TABLE and UPDATE/INSERT are conjoined into one query. And I guess it would also deal with simple relations (ie, the ones already present by virtue of arrays, associative arrays etc.) Which, admittedly, I see little point besides having same lines which were in two queries, now in one. Which I imagine would mostly just slow down the query parser and make things hard for the optimizer. Not that I don't appreciate that it might be convenient, it doesn't sound like the added complexity has enough trade-offs. Besides, with dumb relations, you end up with deformalized data, ripe for inconsistencies. Ie, the whole problem with NOSQL data modelling in the first place.

I wonder if such a job is not better put on something that is not the RDBMs, and in my experience, such tooling exists and already does a decent job without putting an extraneous responsibility on the already very complicated RDBMs.

I think I would feel comfortable putting schema generation in the hands of Java/C# applications. Instead I would go the route of dumping the semi-structured data straight into the RDBMs as JSON, if I was unable to to predefine the schema. I would do this in a development environment, then I would analyze the data, define the schema to structure it and then move it into production where the data would be queried in regular SQL.

speaking entirely from my own experience, schemaless data is nothing but a pain that results in much more work in the long run. For me it has been the data modelling equivalent of peeing your pants. First, it's nice and warm but pretty soon it get's cold and unpleasant.

Much like there's a reason we do not make computer program themselves, giving them the responsibility of designing the data model seems even unwiser. It's a task that requires intelligence.