Y
Hacker News
new
|
ask
|
show
|
jobs
by
reset-password
1151 days ago
LLMs already have problems with fact vs fiction. I don't see how Reddit of all places has "valuable data" in that regard.
4 comments
uptownfunk
1151 days ago
I think the value is in the examples it provides of language.
link
nekoashide
1151 days ago
Top upvoted comments can filter out the useless information and then it can be trained on actual data and refined.
link
Arrath
1151 days ago
Except when top voted comments are hivemind approved 'funny' quips/responses, or in reply to exercises in creative writing like half the posts in relationshipadvice, iwantthemanager, nuclear/pettyrevenge, etc
link
aydyn
1151 days ago
Is this a joke that I'm missing? Top reddit posts are frequently trash filled with misinformation.
link
minimaxir
1151 days ago
Many popular LLMs already include large amount of Reddit comment data which is (usually) cited in their respective papers.
link
surgical_fire
1150 days ago
Reddit also has a problem with fact vs fiction.
link