Hacker News new | ask | show | jobs
by reissbaker 546 days ago
GPT-3 had 175B, and the original ChatGPT was probably just a GPT-3 finetune (although they called it gpt-3.5, so it could have been different). However, it was severely undertrained. Llama-3.1-8B is better in most ways than the original ChatGPT; a well-trained ~70B usually feels GPT-4-level. The latest Llama release, llama-3.3-70b, goes toe-to-toe even with much larger models (albeit is bad at coding, like all Llama models so far; it's not inherent to the size, since Qwen is good, so I'm hoping the Llama 4 series is trained on more coding tokens).
1 comments

> However, it was severely undertrained

by modern standards. at the time, it was trained according to neural scaling laws oai believed to hold.

Sure, at the time everyone misunderstood Chinchilla. Nonetheless it was severely undertrained, even if they didn't know it back then.