I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.
The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.
RTX 4090: ~190 token/sec
I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.
The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.