Hacker News new | ask | show | jobs
by qwerty3344 1147 days ago
I think it's still significantly behind GPT 3.5/4, both of which can get 67% on HumanEval, and 88% with Reflexion
1 comments

Keep in mind that StarCoder(Base) is just a pretrained LM. The extra stuff that makes 3.5/4 like RLHF gets built on this.
Aren't GPT-3 etc base LM and ChatGPT the instruction tuned? Or am I wrong?
code-davinci-002 is a base LM, and the other 3.5 models (text-davinci-{002,003}, gpt-3.5-turbo, and ChatGPT) use instruction tuning and/or RLHF. Source: https://platform.openai.com/docs/model-index-for-researchers