Hacker News new | ask | show | jobs
by advaith08 975 days ago
imo they dont have batching because they pack sequences before passing through the model. so a single sequence in a batch on OpenAI might have requests from multiple customers in it
1 comments

Ah that would make sense. Similar to vLLM which does dynamic packing.