Y
Hacker News
new
|
ask
|
show
|
jobs
by
advaith08
975 days ago
imo they dont have batching because they pack sequences before passing through the model. so a single sequence in a batch on OpenAI might have requests from multiple customers in it
1 comments
sidnb13
972 days ago
Ah that would make sense. Similar to vLLM which does dynamic packing.
link