| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xg15 1253 days ago

Ok, so OpenAI says that ChatGPT is GPT-3.5, but with extensive fine-tuning applied, based on a complex multi-stage feedback process with human evaluators.

But at the same time, you can apparently just take the "raw" GPT-3.5, give it a prompt to behave like an assistant and get comparable results?

So was the whole RLHF process just cargo cult?

1 comments

karfly 1253 days ago

IMO all RLHF stuff is mainly about aligning model not to reply with offensive and inappropriate answers, but NOT about making model's answers better.

link

xg15 1253 days ago

Ah, that makes sense. Thanks!

link