Hacker News new | ask | show | jobs
Ask HN: How the AI companies collect data to train models?
1 points by piotrke 836 days ago
From the Internet, obviously, but how? Are they crawling through every website out there based on the IPs or domain names? Or do they piggyback on Google. Or is there all-internet-data store to just download the latest 'Internet data' dump?
1 comments

They use datasets like common crawl.