Hacker News new | ask | show | jobs
by npodbielski 30 days ago
Drop in replacement for what exactly? Can I use it with llama.cpp and Vulkan? Or vLLM and ROCm?
1 comments

KVBoost is a drop-in replacement for AutoModelForCausalLM. Same API surface (KVBoost.from_pretrained(...), engine.generate(...)), but with cross-request KV reuse, FlashAttention-2, AWQ layer streaming, and speculative decoding bolted on.