Hacker News new | ask | show | jobs
by rfw300 893 days ago
The paper refers to ChatGPT as a 175B parameter LLM. This is almost certainly incorrect; the original largest version of GPT-3 was 175B but analysis of the speed and cost of the current model as well as public statements by OpenAI indicate it’s as much as 5-10x smaller.
1 comments

I think it was leaked that it is 20B now.
It was mentioned to be a 20B in a comparison table in a paper co-written by Microsoft, but they've since claimed that it's just an error, and I mean, they'd need to be sitting on some really impressive distilling techniques to shrink a 175B model down to 20B with only a slight drop in performance.
OpenAI have been sitting on GPT4 for months, and on the basemodel even longer. I would not be surprised if they did some or all of the distillation of the model with GPT4.

Mixtral is 56B combined, if we subtract a little for MoE inefficiencies we could say that Mixtral is about 40B combined. This is a 2x increase over 20B. We have seen new models beat others twice the size.

That and a massive amount of excellent data for alignment should produce some great results.

I don't think it's out of the realm of possibility that 20B is real.