Hacker News new | ask | show | jobs
by danielsf 3720 days ago
Author here: we scraped every script website on the Internet. First we tried to normalize the dataset but only doing stats on the top 1,000 box office, but we were missing too many scripts. So we decided to go big and then display a cut of the data that's only films in the top 2,500 box office (we had about half of those).

We're aware of sampling error and the potential for cherry-picking, but also struggled to figure out what was a representative sample.

1 comments

How do scripts end up on such script websites? Is it a fair assumption that it is random if a movie's script is online or not?

If you go by box office success it seems to me you already introduce the bias of consumer preferences, not choices of the movie industry. Wouldn't it be better to go by production costs (and marketing budget, if that is not included in production costs)? Although over time one would hope the industry choices would reflect consumer preferences.