Hacker News new | ask | show | jobs
by gfodor 4883 days ago
Of course, the problem with that approach is you don't have anything enforcing any sort of data integrity below the application. In my experience most of the time you actually can put down on paper a schema and a set of rules the data should obey without too much fear of it changing dramatically. The nice thing about hstore is it allows you the flexibility to introduce unstructured data in just the places where a schema is unknowable or not worth the complexity.

MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

1 comments

> MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

Unless MongoDB et al are saying "always use MongoDB et al and never an RDBMS", then I'm not sure how you arrived at the conclusion that "the schema is never worth the complexity."

If anything, the appropriate assumption is, "schemas aren't always worth the complexity." When they are, you use an RDBMS. When they aren't, you don't bother with the data integrity constraints.

The "right tool for the job" mantra often cited whereby you run N different data stores for different use cases heavily discounts the true implication of running multiple data stores: you have to run multiple data stores. You have more ways to get burned by your lack of expertise. You need more eyeballs for the same amount of confidence in your system since those will probably need to be different types of experts. You need to know how to monitor them and tune them. Discussion about which data store to use for a given use case becomes a constant drag on discussions. There is less consistency in modeling since you have to work with multiple paradigms. Your software needs to be built to be able to deal with multiple data stores. All your export/import/backup/etc software efforts that are 1-to-1 with each data store need to be multiplied.

The bottom line is if you drop in a second data store because you have a few fields in your database that are a pain to model with a schema, you are doing yourself a disservice compared to just doing ALTER COLUMN foo hstore.

My colleague mcfunley wrote an article about this blind spot when people talk about these issues:

http://mcfunley.com/why-mongodb-never-worked-out-at-etsy

While I agree that is often a blind spot, it is a red herring to this statement made by you:

> MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

You may well argue that if you have N-1 applications using PostgreSQL, and the Nth application could---on its own---justifiably use MongoDB, then it is still appropriate to use PostgreSQL in favor of not adding Yet Another DB Engine.

But that is nothing more than a specific case that is often ignored in the "best tool for the job mantra". It does not mean that schemas are never worth the complexity of an RDBMS.

All I'm saying is that you can't claim that a recommendation of MongoDB assumes schemas are never worth the complexity; you can only claim that the assumption is that they are sometimes not worth the complexity.

More generally, MongoDB makes no assumption that contradicts "years of DB research."