|
|
|
|
|
by visarga
3270 days ago
|
|
> they have access, and can purchase, the largest and best data sets available Google might have an advantage in personal data, that can be used for advertising and health, but when it comes to general data, such as image datasets and NLP datasets, they can be found in the public domain and are growing fast. There is just a specific, limited advantage to Google in datasets. Mostly for ads. |
|
For example, here are some of their recent NLP datasets: https://github.com/google-research-datasets
In images, OpenImages is theirs, and there are assorted ones derived from YouTube.
Stanford's SNLI is the most recent non-Google NLP dataset which is getting used a lot. Babi (from FB) too, if you count that as NLP