|
|
|
|
|
by roadside_picnic
5 hours ago
|
|
M3-Max laptop: ~55 token/sec RTX 4090: ~190 token/sec I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible. The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable. |
|