| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mseri 583 days ago
	You can choose the quantization by appending the right tag to the model name, but they don't support other more advanced useful features (e.g. you need a special flag to enable flash attention and you cannot use KV cache quantization for large contexts).