| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by roadside_picnic 5 hours ago

M3-Max laptop: ~55 token/sec

RTX 4090: ~190 token/sec

I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.

The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.