| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chewxy 832 days ago

I agree. There are other types of AIs with different applications that do not need to be trained on the internet. The examples you have given however, are examples where the deep nets are extremely data hungry.

Take computer vision for example - a "hello world" version of object recognition would use ImageNet, which is 14 million hand annotated images. Or Cifar10 which is 80 million images. That of course but sets the stage for training data differentiation. Google's image recognition algorithm is far superior to other search engines'. Why? Because of Google's data set.

Any Tom Dick and Harry can go create their own image recognition AI and train it based on all the public datasets (COCO, CIFAR, ImageNet) but that's considered pretty baseline nowadays. The differentiator is what _other_ datasets you have.

Different datasets yield different results. It doesn't matter the network. More data is better (usually).