Hacker News new | ask | show | jobs
by changadera 1100 days ago
306 members
1 comments

The thing is, along with all the large subreddits, it's all these niche subreddits that have helped train all these LLM to be able to do the things they can do.

If reddit is thinking that their content is king, then closing subreddits that help generate that content is not ideal for them.

Do we know for sure which LLMs have used reddit comments in training? I want to know if my comment history is in the corpus.
Yes, absolutely. Sam Altman has come out and said it, although specifically he said that social media wasn't of any particular importance for training data.

This can also be seen when you mention davidjl, who was a user super into r/counting. There was a thread of that yesterday I believe.

OpenAI thanks you for your Reddit contributions.