|
|
|
|
|
by philbo
1039 days ago
|
|
Thanks! OpenAI embeddings are 1 per request payload, right? Have you hit any rate limits doing that? We have a performance budget of ~1 second for the generate-index-search pipeline, which may or may not be feasible. I discounted OpenAI because it seemed like we're guaranteed to hit the rate limit if we flood them with concurrent requests for embeddings. Typical corpus size that we need to work with is 20 concurrent documents ranging from ~100kb to ~2mb. Chunking those documents to fit the 8k token context window balloons the request count further. |
|