|
|
|
|
|
by Patrick_Devine
226 days ago
|
|
The default ones on Ollama are MXFP4 for the feed forward network and use BF16 for the attention weights. The default weights for llama.cpp quantize those tensors as q8_0 which is why llama.cpp can eek out a little bit more performance at the cost of worse output. If you are using this for coding, you definitely want better output. You can use the command `ollama show -v gpt-oss:120b` to see the datatype of each tensor. |
|