Hacker News new | ask | show | jobs
by omgwin 154 days ago
It's been mentioned that this model is MLA capable, but it seems like the default vLLM params don't use MLA. Seeing ~0.91MB KV Footprint per token right now. Are you getting MLA to work?