Y
Hacker News
new
|
ask
|
show
|
jobs
by
ACCount37
237 days ago
It is true. Datasets are somewhat cleaned, but only somewhat. When you have terabytes worth of text, there's only so much cleaning you can do economically.