|
|
|
|
|
by cthalupa
5 days ago
|
|
Prefill is another advantage vs. Apple. It's way way way way faster on a spark than it is even on an m5 max. Same model, same quant, same query, as close to as matched settings as I can get from vllm, and for workloads with large prompts + low cacheability, one of my sparks will often be done responding before the mbp is done with prefill. |
|