Y
Hacker News
new
|
ask
|
show
|
jobs
by
omgwin
154 days ago
It's been mentioned that this model is MLA capable, but it seems like the default vLLM params don't use MLA. Seeing ~0.91MB KV Footprint per token right now. Are you getting MLA to work?