Hacker News new | ask | show | jobs
by isodev 1209 days ago
I wonder the same. Also scraping for training data feels like something that should be opt in. I really have a problem with the stance that just because a piece of data is technically accessible, that it’s fair game. It also undermines the lineage and trustworthiness of the final model e.g. how does one verify that a model’s predictive outcomes are in line with expectations.
1 comments

Conversely, an opt-in dataset would surely consist of 99.99% spam.
I think that's easily avoidable - one wouldn't reach out to "spammy" sources in the first place.