Y
Hacker News
new
|
ask
|
show
|
jobs
by
aurareturn
332 days ago
Probably because they are loading the entire model into SRAM. Thats how they can achieve 1.5k tokens/s.