Hacker News new | ask | show | jobs
by astronautas 994 days ago
Wow, make it open source quickly!!! :hype:. It's a classic Python REST API for model serving. But we have very low latency constraints. As such, rewriting in more high performant backend languages e.g. Go or Rust would substantially reduce resource usage (by reducing horizontal scaling need). Pre-baked model serving frameworks e.g. Nvidia's Triton aren't an option, since we have to query a feature store, and do some input feature tracking in between. Go seemed like an efficient, developer friendly choice, but there aren't any well maintained model inference libraries in Go up to this day...
2 comments

We used Triton Inference Server (with a Golang sidecar to translate requests) for model serving and a separate Go app that handled receiving the request, fetching features, sending to Triton, doing other stuff with the response, serving. This scaled to 100k QPS with pretty good performance but does require some hops.

In general writing pure Go inference libraries sucks. Not easy to do array/vector manipulation, not easy to do SIMD/CUDA acceleration, cgo is not go, etc. I wrote a fast XGBoost library at least (https://github.com/stillmatic/arboreal) - it's on par with C implementations, but doing anything more complex is going to be tricky.

Cool, thanks for sharing!
I’ve also ran models in Go, transformers even T5. There wasn’t that much overhead maybe some annoying compilation stuff but nothing crazy.

This was tensorflow btw which has Go bindings support.

It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.

eh, awesome! Seems this one, right? https://github.com/galeone/tfgo. Quite many stars.
I think just native https://pkg.go.dev/github.com/tensorflow/tensorflow/tensorfl... but tfgo looks interesting.

Actually the docs around this weren’t great. Took the train-in-python & inference-in-go approach. And only for versions greater than tf2

Write a blog post then about this! I can tell you it is hardly a solved problem.