Hacker News new | ask | show | jobs
by conradfr 2108 days ago
I interviewed recently at a payment provider that is rewriting its PHP/Mysql monolith in Java & go microservices with MongoDb.

The architect would praise static typing but would prefer MongoDb "because it's easier to add a column". It felt weird but I've never used MongoDb so I could not really argue about it.

2 comments

Sounds like a bad architect or someone who only knows surface features and doesn't have experience with the actual databases.

Adding a column is not hard in any relational database, and pretty much all modern ones support no-downtime transactional schema updates with backfills, concurrent index builds, etc.

Also the schema always exists somewhere, and it's usually to put it in the database so it's next to (and validates) your data rather than keeping spread out in your application code.

Adding columns can be hard at scale. MongoDB allows you to add columns incrementally without stopping the world. You just add the value and it's materialized in the DBMS. The point is you don't have to manually define schema but instead the database works with the structure of the data.

I'm a dyed-in-the-wool RDBMS user but I can see the value of this feature. The trade-off, of course, is that your application has to handle varying schema levels. MongoDB also won't protect you against typos, inconsistent types, and other foolishness. I would not judge people who choose to make this trade--it's a sensible one for many use cases.

Many analytic databases are headed in this direction due to the amount of data that arrives in the form of nested JSON structures. I can't speak for other DBMS types but it's something we're very interested in for ClickHouse.

Isn't adding a column even easier in column-oriented OLAP databases?

I agree that more complex row formats are needed though. Bigquery has done well with nested/repeating structures and Snowflake uses the PAX data format for JSON which has been very useful (however their JSON/VARIANT column doesn't support structured types).

It's definitely easier than conventional RDBMS. Adding a column in ClickHouse is just a metadata operation. BiqQuery has great nested structure support.

Still, it's hard to beat MongoDB in this respect. In my first app I was amazed that I could just insert a BSON object and MongoDB created a queryable table automatically. You pay for it of course in other ways but the ease of use is quite extraordinary.

you can use JSON schema to lockdown you schemas like SQL but few developers choose do that as they prefer the flexibility that MongoDB offers. Changes in the schema are almost always due to upstream changes so the schema is a follower rather than a leader.

Encoding your business logic in the database schema is rarely a good idea.

> Encoding your business logic in the database schema is rarely a good idea.

If by this you mean not defining tables with explicit columns in general I would disagree. It's one of the most effective ways to get speed in analytic applications because it optimizes compression. And in OLTP applications it's the best way to ensure consistency of data.

But perhaps you were referring to a narrower scope like just JSON & upstream changes?

No typing is great whether its BSON types in MongoDB or column types in RDBMS. Anything past that generally gets in the way of flexibility to respond to new business requirements. Business changes rarely originate at the schema level so its a bad place to encode business semantics.
This just isn't true for a lot of use cases. If your requirement is to get low latency response on large data sets you need to cluster data carefully and compress it to reduce I/O. That's why data warehouses use column storage and strong typing.

Compressed storage size for uniform datatypes can be phenomenally efficient. I've seen 10,000x size reduction in ideal cases like monotonically varying integers stored using double delta codec + ZSTD compression.

There's a vast difference between record shape and property types vs business logic.

Storing a record with strong typing does not get in the way of business, and in fact it often helps by maintaining data consistency and integrity.

The comparison here is DB schema + DB schema migration + shared data access layer vs shared data access layer With that I am not clear on what you think the problem is without domain knowledge.

I can understand broadly that RDBMS is better when you have joins and foreign key constraints and nosql is cheaper “web” scale plus there are ACID considerations yet without domain knowledge it’s plausible nosql is better

"Web scale" isn't a real thing and I don't know what "shared data access layer" refers to but relational databases have decades of features and tooling that prove to be very useful in most cases. Even if you don't use ACID, transactions, key relations and other features, maintaining the basic type information in the database still has numerous advantages in data consistency, integrity and performance.

Also everything can scale, and it's all using the same fundamental primitives (sharding, etc) to do so anyway. Some just make it easier with built-in functionality vs external layers.

This is not to say that non-relational systems aren't useful, but that they are rarely used correctly instead of chosen for marketing hype, and hearing things like "adding a column is easier" from an architect usually points to the later situation.

Database schemas only validate the data type and whether it's null or not.

It's next to useless for ensuring data integrity and why every app you see will have data validation inside the code itself. Whether it's checking that an email has a valid structure or that a payment is not negative.

Majority of the logic will be the code so it makes no sense to me other than if you have multiple clients accessing that database why it must be enforced at the database level as well.

The examples you give are perfectly possible in any decent SQL server (mssql, postgres). If anything, database servers are made to ensure data consistency, checks like "payment not negative" are no brainers. If you don't define these constraints at the database level you are leaving its power on the table and are re-inventing the wheel by putting it in your application layer somewhere.
RDBMS can enforce data integrity through normalization (avoiding update anomalies), are you familiar with these? https://www.postgresql.org/docs/13/ddl-constraints.html

That doesn't mean that treating your database as a dumb datastore and having the "smarts" in the application layer doesn't work better for some applications. Both ways have trade-offs.

The shape of a record, property types, foreign keys and more are all critical part of a schema.

Business logic validation is a separate layer but still requires proper types and data integrity underneath, and that's where strong schemas in the database help. More so when you have multiple apps interacting with the same database.

In companies with DBAs the constrains are baked into the DB, in places without DBAs (or weak DBAs dev relation) that is absorbed into the code.

To be fair depending on the culture of the company it maybe easier to put in code and update it there than to wait to have it approved, discussed and scheduled by the DBA

Oh man. Payment provider data in mongo... even if by now MongoDB doesnt lose data... you REALLY want your ledger or OLTP to be in a relational, transactional database.
Coinbase uses MongoDB, Barclays uses MongoDB, BBVA uses MongoDB, Capital One uses MongoDB. Charles Schwab uses MongoDB. FICO uses MongoDB. Goldman Sachs uses MongoDB. HSBC uses MongoDB. Intuit uses MongoDB. Uk Inland Revenue uses MongoDB. UK Dept of Work and Pensions uses MongoDB.

This is the tip of the iceberg when it comes to MongoDB's use cases in Financial services. These organisations REALLY use MongoDB for financial data and they are all public references. Relational databases are great for financial data and so is MongoDB.

Stripe started with mongodb https://www.mongodb.com/blog/post/mongodb-powering-the-magic...

Don't know what they use now or if they still use it.

IIRC Stripe relies heavily on Mongo