Hacker News new | ask | show | jobs
by mseri 583 days ago
You can choose the quantization by appending the right tag to the model name, but they don't support other more advanced useful features (e.g. you need a special flag to enable flash attention and you cannot use KV cache quantization for large contexts).