Hacker News new | ask | show | jobs
by exmadscientist 1150 days ago
"RHLF" is Reinforcement Learning from Human Feedback? (Strange acronym.)

I don't see how that helps a a tool become useful to a very skilled person, if the "human" side of things is polluted by 95+% of users having very low skill. It's great that you can train LLMs on the world's best reference material! But I don't see how you can get the world's best updates into that training set without hiring the world's best experts. So the tool will have very little value for anyone above a certain skill ceiling. Search has already fallen victim to this effect (I'm tired of results pages full of beginner material when I have a deeper question!) and I cannot see this being better for augmentation training sets for LLMs.

1 comments

It should be RLHF, my bad with the spelling.

> So the tool will have very little value for anyone above a certain skill ceiling.

LLMs aren't great for doing tasks you don't know how to do, because you'll eventually have to debug the output. However they excel at performing time consuming tasks that you could do if you really wanted.

That's why I think they'll actually be more useful for experts.