Y
Hacker News
new
|
ask
|
show
|
jobs
by
yvbbrjdr
250 days ago
We actually profiled one of the models, and saw that the last GeMM, which is completely memory bound, is taking a lot of time, which reduces the token speed by a lot.
1 comments
lostmsu
250 days ago
The parent is right, the issue is on your side.
link