|
|
|
|
|
by benxh
842 days ago
|
|
If GPT4 is 220B/8 experts, that would be in-line with 3.5 Turbo being a 20B model, and GPT4 being a 55B activation out of a total 220B parameters. It is ultimately all speculation, until Deepseek releases their own 145B MoE model, and then we can compare the activations/results |
|