| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by victor106 1101 days ago
	Anyone here know where we can find more resources on RLHF? There’s been a lot written about transformer models etc., but I wasn’t able to find much about RLHF.

4 comments

senko 1101 days ago

Blog post from Huggingface: https://huggingface.co/blog/rlhf

Webinar on the same topic (from same HF folks): https://www.youtube.com/watch?v=2MBJOuVq380&t=496s

RLHF as used by OpenAI in InstructGPT (predecessor to ChatGPT): https://arxiv.org/abs/2203.02155 (academic paper, so much denser than the above two resources)

link

samstave 1101 days ago

It will be interesting when we have AI doing RLHF to other AIs based on itself being RLHF'd and having an iterative AI model reinforcement...

But we talk of 'hallucinations' but what we wont get is AI malfeasense identified by AI RLHF trickery/lying?

link

z3c0 1101 days ago

This is essentially the premise behind Generative Adversarial Networks, and if you've seen the results, they're astounding. They're much better for specialized tasks than their generalized GPT counterparts.

link

samstave 1100 days ago

Please expand on this?

link

z3c0 1100 days ago

Sure thing - if you've seen "This Person Does Not Exist", it is the product of GANs: https://thispersondoesnotexist.xyz/

GANs pair a generative model with a classification model (both unsupervised) whose loss functions have been designed to be antithetical. Basically, one performing well means the other is performing poorly. Keeping with the example posed by the given link, this results in a kind of hyper-optimization that causes the generative model to gradually hone in on the perfect way to render a face, while the classification model keeps pace with it and feeds back that "I don't see a face" until something resembling a face emerges. With this approach, you can start with complete noise and end up at a photorealistic face.

link

p1esk 1100 days ago

Lately diffusion models have surpassed GANs in pretty much every way. They don’t have any of that adversarial dynamics you described.

link

hansvm 1101 days ago

It's not the first paper on the topic IIRC, but OpenAI's InstructGPT paper [0] is decent and references enough other material to get started.

The key idea is that they're able to start with large amounts of relatively garbage unsupervised data (the internet), and use that model to cheaply generate decent amounts of better data (ranking generated content rather than spending the man-hours to actually write good content). The other details aren't too important.

[0] https://arxiv.org/abs/2203.02155

link

SleekEagle 1101 days ago

My colleague wrote a couple of pieces that talk about RLHF:

1. https://www.assemblyai.com/blog/the-full-story-of-large-lang... (you can scroll to "What RLHF actually does to an LLM" if you're already familiar with LLMs)

2. https://www.assemblyai.com/blog/how-chatgpt-actually-works/

link

rounakdatta 1101 days ago

There's also this exhaustive post from one and only Chip Huyen: https://huyenchip.com/2023/05/02/rlhf.html

link