|
|
|
|
|
by pythongiant
33 days ago
|
|
KVBoost is a drop-in replacement for AutoModelForCausalLM. Same API surface (KVBoost.from_pretrained(...), engine.generate(...)), but with cross-request KV reuse, FlashAttention-2, AWQ layer streaming, and speculative decoding bolted on. |
|