|
|
|
|
|
by towelrod
4600 days ago
|
|
I'm pretty ignorant of MongoDB so I'm genuinely interested in your response: How would you solve the problem in the epilogue, namely "a chronological listing of all of the episodes of all the different shows that actor had ever been in"? Did Sarah model the data poorly ("We stored each show as a document in MongoDB containing all of its nested information, including cast members"). Or is there an easy way to extract that information that Sarah just doesn't know about yet? Keep in mind the constraints in the article, for example: some shows have 20,000+ episodes, actors show up in 100s of shows, and "We had no way to tell, aside from comparing the names, whether they were the same person". The last part seems like a really straightforward relational critique to me. If you don't break the actors out into unique entities then you can't compare them across shows. But if you do break them out into unique entities, then how to you present the show information without doing joins? |
|
In this example, we have a TV Show, which is modeled as an entity (document). This TV Show has a list of cast members, each one modeled by a nested object.
In a relational database, this type of relationship would be modeled by having a TV_SHOWS table, a CAST_MEMBERS table with a foreign key to the TV_SHOWS table, and a CASCADE DELETE relationship to ensure that if a TV_SHOW is deleted, the related CAST_MEMBER records are also deleted.
This is obviously too strong a relationship between CAST_MEMBERS and TV_SHOWS. (In OO we'd call this a "component" relationship, that is, we're saying that a tv show is composed of cast members, and if we destroy the tv show we destroy the cast members as well.)
They should have modeled CAST_MEMBERS as true entities, by making them documents in their own collection, and storing a list of Cast Member IDs in each TV Show.
You must join, albeit in MongoDB you do this in the application layer, not the database, so:1. Query the cast members collection to find the cast member id. 2. Query the tv shows collection to find all tv shows with cast member id in the cast members set.
Those of us who sharpened our teeth using relational databases have trouble seeing past "two trips to the database" in the above strategy, and that's probably why there's an urge to embed documents rather than to query two collections sequentially. Resist this urge, as it's as as bad as the urge to denormalize, i.e. there'd better be a damn good reason to do it.