Hacker News new | ask | show | jobs
by jt2190 4597 days ago

  > Did Sarah model the data poorly ("We stored each show as a 
  > document in MongoDB containing all of its nested 
  > information, including cast members").
Yes, they modeled the data poorly.

In this example, we have a TV Show, which is modeled as an entity (document). This TV Show has a list of cast members, each one modeled by a nested object.

In a relational database, this type of relationship would be modeled by having a TV_SHOWS table, a CAST_MEMBERS table with a foreign key to the TV_SHOWS table, and a CASCADE DELETE relationship to ensure that if a TV_SHOW is deleted, the related CAST_MEMBER records are also deleted.

This is obviously too strong a relationship between CAST_MEMBERS and TV_SHOWS. (In OO we'd call this a "component" relationship, that is, we're saying that a tv show is composed of cast members, and if we destroy the tv show we destroy the cast members as well.)

They should have modeled CAST_MEMBERS as true entities, by making them documents in their own collection, and storing a list of Cast Member IDs in each TV Show.

  > But if you do break them out into unique entities, then 
  > how to you present the show information without doing 
  > joins?
You must join, albeit in MongoDB you do this in the application layer, not the database, so:

1. Query the cast members collection to find the cast member id. 2. Query the tv shows collection to find all tv shows with cast member id in the cast members set.

Those of us who sharpened our teeth using relational databases have trouble seeing past "two trips to the database" in the above strategy, and that's probably why there's an urge to embed documents rather than to query two collections sequentially. Resist this urge, as it's as as bad as the urge to denormalize, i.e. there'd better be a damn good reason to do it.

1 comments

> This is obviously too strong a relationship between CAST_MEMBERS and TV_SHOWS.

... huh?

> They should have modeled CAST_MEMBERS as true entities, by making them documents in their own collection, and storing a list of Cast Member IDs in each TV Show.

So instead of a one-to-many relationship, they should use a one-to-many relationship expressed in a different notation?