|
|
|
|
|
by mseri
583 days ago
|
|
You can choose the quantization by appending the right tag to the model name, but they don't support other more advanced useful features (e.g. you need a special flag to enable flash attention and you cannot use KV cache quantization for large contexts). |
|