Hacker News new | ask | show | jobs
by xg15 1253 days ago
Ok, so OpenAI says that ChatGPT is GPT-3.5, but with extensive fine-tuning applied, based on a complex multi-stage feedback process with human evaluators.

But at the same time, you can apparently just take the "raw" GPT-3.5, give it a prompt to behave like an assistant and get comparable results?

So was the whole RLHF process just cargo cult?

1 comments

IMO all RLHF stuff is mainly about aligning model not to reply with offensive and inappropriate answers, but NOT about making model's answers better.
Ah, that makes sense. Thanks!