Hacker News new | ask | show | jobs
by functional_dev 81 days ago
I did not know, that NVFP4 was handled at the silicon level... until I dug deeper here - https://vectree.io/c/llm-quantization-from-weights-to-bits-g...
1 comments

I still don't think I understand it. I saw those nvfp4 models up by chance yesterday and tried them on my Linux PC with a 5060TI 16gb. Ollama refused to pull them saying they were macOS only.

I assumed it was a meta-data bug and posted an issue, but apparently nvfp4 doesn't necessarily mean nvidia-fp4.

https://github.com/ollama/ollama/issues/15149

They are nvidia-fp4 weights, but CUDA support isn't _quite_ ready yet, but we've got that cooking.