| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brofallon 1166 days ago
	To use RLHF you need a dataset that includes instructions with good & bad answers - do many of those exist? I know there are a few datasets of just plain instructions-with-responses, but I'm not aware of any that have both good and bad (or ranked) responses. Is that trivial, or an important missing element here?

4 comments

sdenton4 1166 days ago

All of the UX interface have little up/down thumb icons... that's where the boolean feedback comes from. If people stop using that, sentiment analysis on the human responses will likely go a long way.

link

valine 1166 days ago

OpenAssistant has been collecting instruction/response data. They’ve already used that data to refine several llama models with good success.

You can also bootstrap RLHF training data from the gpt4 api. Vicuna is probably the best public model created with gpt4 data available as of today.

link

tzekid 1166 days ago

If I understood correctly, the OpenAssistant team wants to open-source their community built RLHF dataset.

On the other hand, if you're being cheeky, I bet there's a way to datamine from websites like ShareGPT and profit off shared ChatGPT <> User interactions.

link

ttul 1166 days ago

You’re not supposed to do this, but GPT-4 can generate RLHF data.

link