Y
Hacker News
new
|
ask
|
show
|
jobs
by
atgctg
455 days ago
The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% and training time by 8.2%. That is not insignificant.
1 comments
Herring
455 days ago
But LLM performance scales according to the log of compute, so yeah it’s pretty insignificant. I think we’ve reached a bit of a plateau.
link