| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by HellsMaddy 926 days ago

This reminds me of an idea I had for an OpenAI proxy that transparently handles batching of requests. The use case is that OpenAI has rate limits not only on tokens but also requests per minute. By batching multiple requests together you can avoid hitting the requests limit.

This isn’t really feasible to implement if your app runs on lambda or edge functions, you’d need a persistent server.

Here’s a diagram I drew of a simple approach that came to mind: https://gist.github.com/b0o/a73af0c1b63fccf3669fa4b00ac4be52

It would be awesome to see this functionality built into BricksLLM.

3 comments

heyn05tradamu5 926 days ago

They’ve recently added this functionality to AWS Bedrock thankfully. Doesn’t support OpenAI models, but does support Anthropic.

https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-be...

link

te_chris 926 days ago

If you can get Claude approved.

link

swyx 926 days ago

how exactly are you intending to batch different prompts together in the openai api? its not like they accept an array of parallel inputs

link

computerex 926 days ago

OpenAI API doesn't support batching afaik.

link

HellsMaddy 926 days ago

They do: https://platform.openai.com/docs/guides/rate-limits/batching...

link

treprinum 926 days ago

Embeddings can be batched.

link