Hacker News new | ask | show | jobs
by nmfisher 1273 days ago
Just wanted to point out that ChatGPT is more than just a language model - from OpenAI's (very brief) description, it was also trained with reinforcement learning to select/rank the "best" answer [0].

I think the distinction is important because I suspect it explains why ChatGPT succeeds at certain tasks when previous LM-only models failed miserably.

[0] https://openai.com/blog/chatgpt/

1 comments

Yes, that's the difference between a plain language model like GPT-3 and a "task aligned" one like ChatGPT (which is based on GPT 3.5).

I'd describe it still a language model, but just one with "filtered" output.

I'm not sure if ChatGPT has been documented/described, but it's very similar to OpenAI's InstructGPT which they have described, and which they still refer to as a language model.

> We’ve trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models on our API.

https://openai.com/blog/instruction-following/