Hacker News new | ask | show | jobs
by hodgesrm 2117 days ago
Adding columns can be hard at scale. MongoDB allows you to add columns incrementally without stopping the world. You just add the value and it's materialized in the DBMS. The point is you don't have to manually define schema but instead the database works with the structure of the data.

I'm a dyed-in-the-wool RDBMS user but I can see the value of this feature. The trade-off, of course, is that your application has to handle varying schema levels. MongoDB also won't protect you against typos, inconsistent types, and other foolishness. I would not judge people who choose to make this trade--it's a sensible one for many use cases.

Many analytic databases are headed in this direction due to the amount of data that arrives in the form of nested JSON structures. I can't speak for other DBMS types but it's something we're very interested in for ClickHouse.

2 comments

Isn't adding a column even easier in column-oriented OLAP databases?

I agree that more complex row formats are needed though. Bigquery has done well with nested/repeating structures and Snowflake uses the PAX data format for JSON which has been very useful (however their JSON/VARIANT column doesn't support structured types).

It's definitely easier than conventional RDBMS. Adding a column in ClickHouse is just a metadata operation. BiqQuery has great nested structure support.

Still, it's hard to beat MongoDB in this respect. In my first app I was amazed that I could just insert a BSON object and MongoDB created a queryable table automatically. You pay for it of course in other ways but the ease of use is quite extraordinary.

you can use JSON schema to lockdown you schemas like SQL but few developers choose do that as they prefer the flexibility that MongoDB offers. Changes in the schema are almost always due to upstream changes so the schema is a follower rather than a leader.

Encoding your business logic in the database schema is rarely a good idea.

> Encoding your business logic in the database schema is rarely a good idea.

If by this you mean not defining tables with explicit columns in general I would disagree. It's one of the most effective ways to get speed in analytic applications because it optimizes compression. And in OLTP applications it's the best way to ensure consistency of data.

But perhaps you were referring to a narrower scope like just JSON & upstream changes?

No typing is great whether its BSON types in MongoDB or column types in RDBMS. Anything past that generally gets in the way of flexibility to respond to new business requirements. Business changes rarely originate at the schema level so its a bad place to encode business semantics.
This just isn't true for a lot of use cases. If your requirement is to get low latency response on large data sets you need to cluster data carefully and compress it to reduce I/O. That's why data warehouses use column storage and strong typing.

Compressed storage size for uniform datatypes can be phenomenally efficient. I've seen 10,000x size reduction in ideal cases like monotonically varying integers stored using double delta codec + ZSTD compression.

There's a vast difference between record shape and property types vs business logic.

Storing a record with strong typing does not get in the way of business, and in fact it often helps by maintaining data consistency and integrity.