Hacker News new | ask | show | jobs
by magicalhippo 355 days ago
> You have to classify inputs one LLM call at a time?

Yes, but it's possible to batch the calls when feeding the data through the neural network, so LLM libraries might support that.

See for example this[1] article which gives a brief overview of batching calls using vLLM.

[1]: https://medium.com/ubiops-tech/how-to-optimize-inference-spe...