|
|
|
|
|
by whakim
1040 days ago
|
|
Are you tied to any particular transformer model? Using a smaller model, throwing more hardware at the problem, or generating embeddings in parallel are easy ways to make it faster. Depending on what you're doing with the output you may also consider truncating your documents (can be good for stuff like clustering) or breaking apart your documents (can improve search performance). Another option if you just want search (and aren't training or tuning your own models) is a managed search offering where you aren't responsible for generating embeddings. |
|
Naively I guess, at first we hoped to get by using a 3rd party API. We're hosted in GCP and tried using the Vertex AI `textembedding-gecko` model initially. But now we're investigating running models on our own infra, although not sure where we've got with it yet as someone else is working on that.