Hacker News new | ask | show | jobs
by pythongiant 33 days ago
KVBoost is a drop-in replacement for AutoModelForCausalLM. Same API surface (KVBoost.from_pretrained(...), engine.generate(...)), but with cross-request KV reuse, FlashAttention-2, AWQ layer streaming, and speculative decoding bolted on.