| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by HaZeust 698 days ago
	Probably because the benchmarks with higher models are, at this time, negligible. Increasing transformers and iterating attention might be a dead-stop for more capable models beyond 2T parameters. But, I'm not sure.