| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sputknick 796 days ago
	I'm not OP, but George Hotz said in his lex friedman podcast a while back that it was an MoE of 8 250B. subtract out duplication of attention nodes, and you get something right around 1.8T

1 comments

I'm pretty sure he suggested it was a 16 way 110 MoE

The exact quote: "Sam Altman won’t tell you that GPT 4 has 220 billion parameters and is a 16 way mixture model with eight sets of weights."