|
|
|
|
|
by sp4ke
3140 days ago
|
|
| On the other hand, think about it: what do Google and FB have that we don't? Personal data. What they have is data that is useful to target ads. Yeah ? like the tons of photos they harvest from people. Most of the progress they did in training computer vision is based on that. Should I build facebook or google to get access to it ? What about language modeling ? They have access to conversational data and billions of search queries, both of which there is no way to access them from outside. What about health ? Well if I'm not somehow working with some big pharma how could I access this kind of data ? I can go on and on. The point is, yes I can crawl the web, but what "web" is there left ? everything is locked behind paywalls and private clouds. If the real vision of an open internet was fulfilled, all data generated on it would be accessible to crawl indeed. I'm not saying it's not possible to get data and use it. I'm saying you cannot get the kind of data only monopolies have and you will never be able to compete with them. |
|
Language modeling: hacker news, public mailing lists, wikipedia, github.
Health: you can usually get data if you work at a hospital as an md or researcher. Just need a reasonable idea and an IRB. If you want the pharmacy data, I imagine you could get at it by going to work as a researcher in pharma, insurance, or retailer.
alphago was built using publicly available games of go pros. Alphagozero didn't even depend on data at all.
For AI, the limiting factors are ideas, code, time, hardware.