|
|
|
|
|
by K0SM0S
2337 days ago
|
|
So if I understand correctly, to reformulate in my own words/views: while the "big data" (datasets) formed and thus owned by big-tech, big-ads, big-brother, etc. may be instrumental to build at-scale solutions for real-world usage (for profit, knowledge, control, whatever actionable goal), fundamental research itself, as done in universities, can move forward without these datasets: using what's publicly available is enough. Did I read this right? It would effectively add much needed nuance to the common perception that big data is necessary to train innovative models, that there might be some sort of monopoly on oil (data, the 'fuel' of ML) by a few champions of data collection. |
|
On the other hand, they never actually gave our API keys the necessary privileges, so in the end I just reverse-engineered the URL scheme of their streams and scraped them. Many datasets used in academia are just collections of publicly available data (e.g. Wikipedia, images found by googling), optionally annotated for cheap using Amazon Mechanical Turk. Experimenting with that kind of data is also open to independent researchers. You don't need to work at a data-hoarding company if you can get what you need by scraping their website.