|
|
|
|
|
by embedding-shape
228 days ago
|
|
GPT-OSS-120b/20b is probably the best you can run on your own hardware today. Be careful with the quantized versions though, as they're really horrible compared to the native MXFP4. I haven't looked in this particular case, but Ollama tends to hide their quantizations for some reason, so most people who could be running 20B with MXFP4, are still on Q8 and getting much worse results than they could. |
|
Most gpt-oss GGUF files online have parts of their weights quantized to q8_0, and we've seen folks get some strange results from these models. If you're importing these to Ollama to run, the output quality may decrease.