Y
Hacker News
new
|
ask
|
show
|
jobs
by
regularfry
128 days ago
They're claiming 20+tps inference on a macbook with the unsloth quant.
1 comments
embedding-shape
128 days ago
Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.
link