Hacker News new | ask | show | jobs
by benxh 842 days ago
If GPT4 is 220B/8 experts, that would be in-line with 3.5 Turbo being a 20B model, and GPT4 being a 55B activation out of a total 220B parameters.

It is ultimately all speculation, until Deepseek releases their own 145B MoE model, and then we can compare the activations/results

1 comments

I think the conjecture is that each expert of GPT-4 has 220B parameters, for a total of 1.76T parameters.