Hacker News new | ask | show | jobs
by alvare 4546 days ago
What about the Perry Mason thing? That's scary shit.
2 comments

Theories:

1) Assuming there's some clustering algorithm in use, it could just be some tradeoff edge case that isn't worth optimizing away.

2) It could be that one of their reviewers had an overly-aggresive and non-standard genre-tagging approach for Perry Mason and classified a bunch of shows in such a way that the source data was polluted. This could be something stupid like most other movies having many reviewers giving higher confidence, while a large number of Perry Mason shows and DVDs only had a single reviewer who was either through a bug or through overweighting, given too much influence. This seems to be the most likely cause of the bug--some skew or amplification in the source data.

3) Intentional poisoning of the results, like cartographers putting in bogus features, or data sellers seeding their data with watermarks, etc.

Sounds like some data cleanup issue. Definitely an outlier that should probably be ignored.