Hacker News new | ask | show | jobs
by throwawaysnxcv2 1131 days ago
Even if cast lists don't have copyright, they still need to source the data from somewhere. If I was responsible for this at Netflix, I would prefer to buy that data from a good quality source instead of scraping the data from external websites or running OCR on the credits screen.
1 comments

Netflix adds on average about 1 show per day. It would be incredibly cheap to have an intern just watch the credits and write down the names. Maybe you get a second intern to doublecheck their work. Sure buying access to an already compiled database would be preferable to making your own, but even that's pretty cheap. For IMDB a million api requests per day costs $45/month - if we assume 3600 shows, 5 seasons per show, 10 episodes per season, 300 credits per episode that works out to less than 2 months of api requests, or about $90 total (substantially less assuming you can retrieve multiple credits with a single api request) to create a database for their entire catalogue. I've got more than that in my wallet right now. Of course you probably need an engineer to write a script to make all those API requests and at that scale it will take a little time to write and some effort to maintain, less facetiously it's probably going to be comparable to the cost of a few interns. Still, this is definitely not an insurmountable barrier for a multibillion dollar company.