| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jychang 146 days ago
	They didn't do something stupid like Llama 4 "one active expert", but 4 of 256 is very sparse. It's not going to get close to Deepseek or GLM level performance unless they trained on the benchmarks. I don't think that was a good move. No other models do this.