| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by atgctg 503 days ago
	The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% and training time by 8.2%. That is not insignificant.

1 comments

But LLM performance scales according to the log of compute, so yeah it’s pretty insignificant. I think we’ve reached a bit of a plateau.