Hacker News new | ask | show | jobs
by esperent 1154 days ago
> people who inference their models at scale

What does "inferencing models at scale" mean?

3 comments

Actually using the model, instead of doing inference a few times when writing your paper and then that's it.
Optimizing it to be cheaper to run (e.g. OpenAI saved a lot of money with the turbo variant that ChatGPT uses, relative to the original GPT-3 models).
Running server farms to quickly give answers to users, like bing or ChatGPT.

The models produced have to be fast and efficient to support capacity/cost, which has some detriment to quality/accuracy.