| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by qwerty3344 1147 days ago
	I think it's still significantly behind GPT 3.5/4, both of which can get 67% on HumanEval, and 88% with Reflexion

1 comments

Keep in mind that StarCoder(Base) is just a pretrained LM. The extra stuff that makes 3.5/4 like RLHF gets built on this.

Aren't GPT-3 etc base LM and ChatGPT the instruction tuned? Or am I wrong?

code-davinci-002 is a base LM, and the other 3.5 models (text-davinci-{002,003}, gpt-3.5-turbo, and ChatGPT) use instruction tuning and/or RLHF. Source: https://platform.openai.com/docs/model-index-for-researchers