Hacker News new | ask | show | jobs
by haxton 1018 days ago
gpt3.5 turbo is (mostly likely) Curie which is (most likely) 6.7b params. So, yeah, makes perfect sense that it can't compete with a 70b model on cost.
5 comments

gpt3.5 turbo is a new model, not Curie. As others have stated, it probably uses Mixture of Experts which lowers inference cost.
Is there a source on that? I've never seen anyone think it's below even 70B
It still does a much better job at translation than llama 2 70b even, at 6.7b params
If it's MOE that may explain why it's faster and better...
MOE?
I thought it was fairly well established that GPT 3.5 has something like 130B parameters and that GPT 4 is on the order of 600-1,000
I remember:

- gpt-3.5 175b params

- gpt-4 1800b params