| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by benxh 889 days ago
	If GPT4 is 220B/8 experts, that would be in-line with 3.5 Turbo being a 20B model, and GPT4 being a 55B activation out of a total 220B parameters. It is ultimately all speculation, until Deepseek releases their own 145B MoE model, and then we can compare the activations/results

1 comments

I think the conjecture is that each expert of GPT-4 has 220B parameters, for a total of 1.76T parameters.