|
|
|
|
|
by Beefin
2143 days ago
|
|
I wouldn't say arrays are bad, and the entire paradigm of data modeling in mongodb is to store your data based on your application usage patterns. if you have to query across multiple collections via a $lookup, then maybe you'd benefit from embedding the smaller of those collections into the former. |
|
One example: We used mongodb to track the state of a general CSV import system. So, we'd have a document for each csv file a user imported, and on that document, we were storing errors which occurred during the import to later display to the user. Of course, in an array. Worked great for years, until one day, a user uploaded a very bad CSV, non-maliciously, with hundreds of thousands of lines, with dozens of errors on each line, generating an array millions of items large. The failure condition here was wild: the import just got slower, and slower, and slower, until eventually the (modestly provisioned) db cluster started failing. We immediately normalized that array into its own collection, re-ran the import, still generated millions of errors, but with no problem.
As always, a general rule doesn't apply in every situation, but in my experience, unless you have a really strong grasp on how a system's use will scale, years into the future, unbounded arrays are icky. Lookups across two collections are only a modest performance loss over a direct query, and if the arrays get up there in size, they can actually be faster.