Hacker News new | ask | show | jobs
by HellsMaddy 926 days ago
This reminds me of an idea I had for an OpenAI proxy that transparently handles batching of requests. The use case is that OpenAI has rate limits not only on tokens but also requests per minute. By batching multiple requests together you can avoid hitting the requests limit.

This isn’t really feasible to implement if your app runs on lambda or edge functions, you’d need a persistent server.

Here’s a diagram I drew of a simple approach that came to mind: https://gist.github.com/b0o/a73af0c1b63fccf3669fa4b00ac4be52

It would be awesome to see this functionality built into BricksLLM.

3 comments

They’ve recently added this functionality to AWS Bedrock thankfully. Doesn’t support OpenAI models, but does support Anthropic.

https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-be...

If you can get Claude approved.
how exactly are you intending to batch different prompts together in the openai api? its not like they accept an array of parallel inputs
OpenAI API doesn't support batching afaik.
Embeddings can be batched.