Personalize embedding results with application data in your database

There is a lot of latency involved shuffling data for modern/complex ML systems in production. In our experience these costs dominate end-to-end user experienced latency, rather than actual model or ANN algorithms, which unfortunately limits what is achievable for interactive applications.

We've extended Postgres w/ open source models from Huggingface, as well as vector search, and classical ML algos, so that everything can happen in the same process. It's significantly faster and cheaper, which leaves a large latency budget available to expand model and algorithm complexity. In addition open source models have already surpassed OpenAI's text-embedding-ada-002 in quality, not just speed. [1]

Here is a series of posts explaining how to accomplish the complexity involved in a typical ML powered application, as a single SQL query, that runs in a single process with memory shared between models and feature indexes, including learned embeddings and reranking models.

- Generating LLM embeddings with open source models in the database[2]

- Tuning vector recall [3]

- Personalize embedding results with application data [4]

This allows a single SQL query to accomplish what would normally be an entire application w/ several model services and databases

e.g. for a modern chatbot built across various services and databases

  -> application sends user input data to embedding service
      <- embedding model generates a vector to send back to application
  -> application sends vector to vector database
      <- vector database returns associated metadata found via ANN
  -> application sends metadata for reranking
      <- reranking model prunes less helpful context
  -> application sends finished prompt w/ context to generative model
      <- model produces final output
  -> application streams response to user

[1]: https://huggingface.co/spaces/mteb/leaderboard

[2]: https://postgresml.org/blog/generating-llm-embeddings-with-o...

[3]: https://postgresml.org/blog/tuning-vector-recall-while-gener...

[4]: https://postgresml.org/blog/personalize-embedding-vector-sea...

Github: https://github.com/postgresml/postgresml