Hacker News new | ask | show | jobs
by varispeed 1203 days ago
I read a lot about training models and so on, but very little about inference.

Let's say you came up with the custom model that gives good results, how do you transfer that model so it can be used in an API?

2 comments

There's no one answer to that since different models are.. different. Beyond just modalities (text input and image output? image input and video output?), there are different common underlying tools used to build them. And then, of course, what do you mean by API? How do you want to interact with it?

As a general thing, you'd take a request that would require an inference step, which would then invoke the model with some parameters and input, and return the output. Beyond that, you'd need more detail.

I specialize in this area and build a product for self hosted inference.

The challenge to support a new model architecture is about coding the preprocessing for inputs (like tokenization or image resizing and color feature extraction) and post processing the outputs (for example entity recognition needs to lookup the entities and align the text).

Once an architecture is coded for the pre/post processing, then serving a new model for inference with that architecture is easy!