Y
Hacker News
new
|
ask
|
show
|
jobs
by
ccgreg
53 days ago
I don't know of anyone who uses Common Crawl as pre-training data without filtering it. We have an annotation system that lets people pick and choose which subsets they'd like to use.