Y
Hacker News
new
|
ask
|
show
|
jobs
by
simonw
4 hours ago
It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.