|
You have to separate between "Mongo the database" and "Mongo as it's used by companies"; the latter causes far more problems than the former. I last used MongoDB seriously in 2012-2015. We had myriad operations problems including inconsistent indexing across shards (where some shards had an index created and others didn't, it was baffling), issues with the balancer not moving chunks properly, and more. Also it's just different than other DBs with its lack of transactional consistency (I think they've made progress on building this), but that's part of why it's fast. However, the bigger problem is that document databases -- in general -- enable a kind of software development where the model sort of emerges over time, rather than being carefully designed from the beginning. Yes, it's flexible, but you pay an absolutely enormous cost down the line dealing with inconsistent documents. It's not like code where if you do something stupid, you can fix it over time with refactoring and "remodeling" -- data has mass. You can get into a situation where, with a large data set, it can take a week or more just to run the migration script required to scan an entire collection and rewrite a few billion documents into a new, better format. There is no such thing as a "schemaless" database. That's like saying, oh sure, we just have a bunch of 1s and 0s in memory -- our data is "structureless". The question is whether the database enforces the schema, or not. And I think that in a lot of cases, it's a lot worse to have an "uncodified schema" than a rigid, but at least well-defined, one, that's consistent across the data at all points. Sidenote: It's also occurred to me over the past few years that it's almost impossible to impose a consistent schema on a large enough dataset. If you truly are dealing with "big data" (TB/PB scale) maybe go straight to the document store of columnar because doing a migration is outright impossible, but don't be so quick to write it off for GB-scale datasets. |
A challenge there is determining whether it makes sense to massage the data into a common schema for further analysis or to use an unstructured initial approach from the beginning. Sometimes you get to the former from the latter.