| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jessaustin 4644 days ago
	Another "issue" occurs to me. It seems likely that the data coming in about TV shows, especially old ones with decades of episodes, would be a bit "dirty". This sort of thing just slides right into a document store, but a relational one would have some problems with that. How do we know e.g. that "Bryan Cranston", "Bryan Lee Cranston", and "Brian Cranston" are the same (or different) actors? Of course these things can be fixed with enough manual (or, even better, user) intervention, but the time and place for that are after you've got the data in the system, not before.

1 comments

> How do we know e.g. that "Bryan Cranston", "Bryan Lee Cranston", and "Brian Cranston" are the same (or different) actors?

In the USA, the various professional creative guilds enforce uniqueness.

Your general musing is right, but the problem of source-data quality is generally considered to be distinct from the design of schemata.