|
|
|
|
|
by HellsMaddy
926 days ago
|
|
This reminds me of an idea I had for an OpenAI proxy that transparently handles batching of requests. The use case is that OpenAI has rate limits not only on tokens but also requests per minute. By batching multiple requests together you can avoid hitting the requests limit. This isn’t really feasible to implement if your app runs on lambda or edge functions, you’d need a persistent server. Here’s a diagram I drew of a simple approach that came to mind: https://gist.github.com/b0o/a73af0c1b63fccf3669fa4b00ac4be52 It would be awesome to see this functionality built into BricksLLM. |
|
https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-be...