|
|
|
|
|
by patresh
830 days ago
|
|
The high level API seems very smooth to quickly iterate on testing RAGs. It seems great for prototyping, however I have doubts whether it's a good idea to hide the LLM calling logic in a DB extension. Error handling when you get rate limited, the token has expired or the token length is too long would be problematic, and from a security point of view it requires your DB to directly call OpenAI which can also be risky. Personally I haven't used that many Postgres extensions, so perhaps these risks are mitigated somehow that I don't know? |
|
On Tembo cloud, we deploy this as part of the VectorDB and RAG Stacks. So you get a dedicated Postgres instance, and a container next to Postgres that hosts the text-to-embeddings transformers. The API calls/data never leave your namespace.