| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by embedding-shape 128 days ago
	Curious what the prefilled and token generation speed is. Apple hardware already seem embarrassingly slow for the prefill step, and OK with the token generation, but that's with way smaller models (1/4 size), so at this size? Might fit, but guessing it might be all but usable sadly.

1 comments

regularfry 128 days ago

They're claiming 20+tps inference on a macbook with the unsloth quant.

link

embedding-shape 128 days ago

Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.

link