Y
Hacker News
new
|
ask
|
show
|
jobs
by
Octoth0rpe
87 days ago
> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s)
There seems to be at least some detail on that point.