|
|
|
|
|
by jiggawatts
87 days ago
|
|
Unless I'm missing something, this uses a simple synchronous for loop: for text in texts:
key = (text, model)
if key not in pickle_cache:
pickle_cache[key] = openai_client.create_embedding(text, model=model)
embeddings.append(pickle_cache[key])
operations.save_pickle_cache(pickle_cache, pickle_path)
return embeddings
At the throughput rates I was seeing of one embedding per second, a million comments would take over a week to process!I had to call the Gemini model with ten comments at a time from eight threads to reach even the paltry 3K rpm rate limit they offer to "Tier 1" customers. Based on this experience, for real "enterprise" customers I might implement a generic wrapper for Google's Batch API that could handle continuous streaming from a database, chunking it, uploading, and then in parallel checking the status of the pending jobs and streaming the results back into a database. |
|
Just plug any async function into the provided async context manager and you get Batch APIs in two lines of code with any existing framework you currently have: https://github.com/vienneraphael/batchling
Let me know if you have any questions, looking forward to having your feedback!