|
|
|
|
|
by ikhatri
1108 days ago
|
|
You can use the onnx cpu runtime in python or c++ too. It doesn’t have to be rust. And if you want GPU support you can even run models saved in the onnx format on Nvidia GPUs with the TensorRT runtime. Honestly while ggml is super cool. It started as a hobby project and you probably shouldn’t use it in production. ONNX has been the defacto standard for ML inference for years. What it is missing (compared to ggml) is 2-6bit inference which is helpful for large scale transformers on edge devices (and is what helped ggml gain adoption so fast). |
|