Hacker News new | ask | show | jobs
by ollin 44 days ago
Most ONNX files are fp32, but the ONNX format actually allows fp16, int8, etc. as well (see onnx.proto for the full list of dtypes [1] - they even have fp8/fp4 these days!). I ended up switching over to fp16 ONNX models for my own web-based inference project since the quality is ~identical and page loads get 2x faster.

[1] https://github.com/onnx/onnx/blob/main/onnx/onnx.proto#L605

1 comments

Thanks for the pointer actually. I need to take a look at this version of the spec.