Hacker News new | ask | show | jobs
by cthalupa 5 days ago
Prefill is another advantage vs. Apple. It's way way way way faster on a spark than it is even on an m5 max.

Same model, same quant, same query, as close to as matched settings as I can get from vllm, and for workloads with large prompts + low cacheability, one of my sparks will often be done responding before the mbp is done with prefill.