| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by omgwin 154 days ago
	It's been mentioned that this model is MLA capable, but it seems like the default vLLM params don't use MLA. Seeing ~0.91MB KV Footprint per token right now. Are you getting MLA to work?