|
|
|
|
|
by cthalupa
23 days ago
|
|
There are a variety of inference engines that support this, regardless of whether or not there is native FP8 in Ampere - llama.cpp will do it quite happily. VLLM you can do W8A16 quant too. There are a whole lot of ways to quantize models in general. |
|