Hacker News new | ask | show | jobs
by astronautas 1002 days ago
Make it for Go, and I am sold. Running ML models in Go services is still an unsolved problem.
2 comments

We have a similar high performance AI stack written in Go capable to load many different models from different frameworks. This is work of several years. Just saw your comment and thought about our company internal talk to release everything under an open source license. Thanks for reminding me :) What are your use-cases?
Wow, make it open source quickly!!! :hype:. It's a classic Python REST API for model serving. But we have very low latency constraints. As such, rewriting in more high performant backend languages e.g. Go or Rust would substantially reduce resource usage (by reducing horizontal scaling need). Pre-baked model serving frameworks e.g. Nvidia's Triton aren't an option, since we have to query a feature store, and do some input feature tracking in between. Go seemed like an efficient, developer friendly choice, but there aren't any well maintained model inference libraries in Go up to this day...
We used Triton Inference Server (with a Golang sidecar to translate requests) for model serving and a separate Go app that handled receiving the request, fetching features, sending to Triton, doing other stuff with the response, serving. This scaled to 100k QPS with pretty good performance but does require some hops.

In general writing pure Go inference libraries sucks. Not easy to do array/vector manipulation, not easy to do SIMD/CUDA acceleration, cgo is not go, etc. I wrote a fast XGBoost library at least (https://github.com/stillmatic/arboreal) - it's on par with C implementations, but doing anything more complex is going to be tricky.

Cool, thanks for sharing!
I’ve also ran models in Go, transformers even T5. There wasn’t that much overhead maybe some annoying compilation stuff but nothing crazy.

This was tensorflow btw which has Go bindings support.

It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.

eh, awesome! Seems this one, right? https://github.com/galeone/tfgo. Quite many stars.
I think just native https://pkg.go.dev/github.com/tensorflow/tensorflow/tensorfl... but tfgo looks interesting.

Actually the docs around this weren’t great. Took the train-in-python & inference-in-go approach. And only for versions greater than tf2

Write a blog post then about this! I can tell you it is hardly a solved problem.
This seems to be a reasonable approach for Go, but you did need to carry a lot in your containerized environment (Go tends to have very lean container, and this approach requires a fat container with CUDA, PyTorch, Python etc).