Does it mean we load 175B gpt-3 model first, then overwrite 1.3B parameters with InstructGPT?
I find this sentence difficult to understand
> Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model
https://openai.com/research/instruction-following
I am a newbie, plz correct me if I am wrong.
From the gpt-3 paper it looks like they have many variants like
- GPT-3-350M
- GPT-3-1.3B
- GPT-3-2.7B
- GPT-3-6.7B
- GPT-3-13B
- GPT-3-175B
Ada, Babbage, Curie and Davinci line up closely with 350M, 1.3B, 6.7B, and 175B respectively. The names are pretty suggestive.
Does it mean we load 175B gpt-3 model first, then overwrite 1.3B parameters with InstructGPT?
I find this sentence difficult to understand
> Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model
https://openai.com/research/instruction-following
I am a newbie, plz correct me if I am wrong.