Hacker News new | ask | show | jobs
by philbo 1039 days ago
Thanks!

OpenAI embeddings are 1 per request payload, right? Have you hit any rate limits doing that?

We have a performance budget of ~1 second for the generate-index-search pipeline, which may or may not be feasible. I discounted OpenAI because it seemed like we're guaranteed to hit the rate limit if we flood them with concurrent requests for embeddings. Typical corpus size that we need to work with is 20 concurrent documents ranging from ~100kb to ~2mb. Chunking those documents to fit the 8k token context window balloons the request count further.

1 comments

You absolutely want to chunk them smaller than 8k. Have you tested different chunk strategies? It can make a huge difference for actually recalling useful information in small enough chunks to be usable.
Thanks for the tip, I haven't played around with chunk size much at all so far.