Y
Hacker News
new
|
ask
|
show
|
jobs
by
npodbielski
30 days ago
Drop in replacement for what exactly? Can I use it with llama.cpp and Vulkan? Or vLLM and ROCm?
1 comments
pythongiant
30 days ago
KVBoost is a drop-in replacement for AutoModelForCausalLM. Same API surface (KVBoost.from_pretrained(...), engine.generate(...)), but with cross-request KV reuse, FlashAttention-2, AWQ layer streaming, and speculative decoding bolted on.
link