|
|
|
|
|
by nmstoker
750 days ago
|
|
The deduplication discussion shows they don't filter out ads as part of their cleaning - I appreciate this could be risky and perhaps a huge processing step given dataset sizes, but intuitively it feels like it would cut the noise dramatically and thus help tbe signal within datasets. |
|