| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yorwba 2345 days ago
	It's not exactly true that research institutions don't have access to the same big datasets as companies. For example, I took a course that involved tracking soccer players using videos provided by a streaming company that specializes in amateur soccer. They promised to give us access to their internal API under an NDA, which they wouldn't have done for just anyone. On the other hand, they never actually gave our API keys the necessary privileges, so in the end I just reverse-engineered the URL scheme of their streams and scraped them. Many datasets used in academia are just collections of publicly available data (e.g. Wikipedia, images found by googling), optionally annotated for cheap using Amazon Mechanical Turk. Experimenting with that kind of data is also open to independent researchers. You don't need to work at a data-hoarding company if you can get what you need by scraping their website.