|
|
|
|
|
by ad_hominem
1508 days ago
|
|
> Pushshift.io Reddit corpus Pushshift is a single person with some very strong political opinions who has specifically used his datasets to attack political opponents. Frankly I wouldn't trust his data to be untainted. These models really need to be trained on more official data sources, or at least something with some type of multi-party oversight rather than data that effectively fell off the back of a truck. edit: That's not even to mention I believe it's flat-out illegal for him to collect and redistribute this data as Reddit users did not agree to any terms of use with him. Just look at the disastrous mess of his half-baked "opt-out" thing that flagrantly violates GDPR: https://www.reddit.com/r/pushshift/comments/pat409/online_re... |
|