Hacker News new | ask | show | jobs
by sidnb13 975 days ago
Yep, batching is a feature I really wish the OpenAI API had. That and the ability to intelligently cache frequently used prompts. Much easier to achieve this with a hosted OS model, so I guess it's a speed + customizability/cost tradeoff for the time being.
1 comments

imo they dont have batching because they pack sequences before passing through the model. so a single sequence in a batch on OpenAI might have requests from multiple customers in it
Ah that would make sense. Similar to vLLM which does dynamic packing.