Hacker News new | ask | show | jobs
by civilitty 1106 days ago
People training AI were already using CommonCrawl. There’s too many data sources to figure out each API. Everyone just downloads CC from AWS.