Y
Hacker News
new
|
ask
|
show
|
jobs
by
Oras
205 days ago
Hard time? What value does adult videos description, views and comments add to small (7,32B) models?
2 comments
andy99
205 days ago
It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds
link
ccgreg
202 days ago
Common Crawl is a particular dataset. commoncrawl.org
link
khimaros
205 days ago
what if that's where they learned how to utilize the double entendre? hard times indeed.
link