| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ikhatri 1108 days ago
	You can use the onnx cpu runtime in python or c++ too. It doesn’t have to be rust. And if you want GPU support you can even run models saved in the onnx format on Nvidia GPUs with the TensorRT runtime. Honestly while ggml is super cool. It started as a hobby project and you probably shouldn’t use it in production. ONNX has been the defacto standard for ML inference for years. What it is missing (compared to ggml) is 2-6bit inference which is helpful for large scale transformers on edge devices (and is what helped ggml gain adoption so fast).

1 comments

touisteur 1108 days ago

Intel OpenVINO is also quite punchy for CPU inference.

link

ikhatri 1108 days ago

Yeah I've heard of it but never used it. Looks like they have a backend/runtime for ONNX models as well (https://pypi.org/project/onnxruntime-openvino/) neat!

ONNX really is the universal format. If you can get your model exported to ONNX, running it on various platforms becomes much easier.*

*as long as every hardware platform supports the ops you use in your network and you're not doing anything too fancy/custom :P

link

touisteur 1107 days ago

Yeah I've only used it with networks in ONNX format (converted from tensorflow or torch). I was looking for high perf low latency / real-time, the C or C++ APIs for OpenVINO are quite OK if you spend some time playing with it. I hope Intel keeps investing on it...

Edit: often if you go through the ONNX intermediate format, be prepared to perform some 'network surgery' to clean up some conversion cruft, but also to remove training-only stuff left in the network...

link