| Hey HN, Matusa here! A friend and I have built Memora. Memora is a vector database with built-in multistage reranking, which can significantly improve search accuracy over semantic search. It also features a proprietary embedding model tailored for RAG use cases — where there's a structural mismatch between the content stored and the query used for searching (hence why HyDE works well). Memora started because we were working on a stealth AI startup where we used an agent that would query into a vector DB, but it would take multiple tries for the agent to find what it needed (20% of the time it couldn't find at all). This process was not only costly but also time-consuming, with each search taking up precious seconds. We realized that our biggest bottleneck was the accuracy from the semantic search results. So, in order to improve the product, we had to go beyond simple semantic search and, ended up creating a retrieval pipeline that used semantic search as the initial step, providing the first 1k batch of documents. These documents were then reranked using neural rankers. Not only we were able to increase the product accuracy by over 4x, we were able to completely eliminate the need for the agent making multiple search queries. A cool challenge was creating the two ranking models for Memora's retrieval pipeline. We applied the rankT5 principle, converting a encoder-decoder LLM model to an encoder-only by transforming llama-7b into rank-llama. We, then, finetuned it further on a ton of synthetic data. However, running a model with 7B parameters can be costly. That's where our second ranking model, with 120M parameters, comes into play. This model was crafted by distilling rank-llama. On top of that, we're also trying to focus on offering a great DX: i) we feel that our Javascript/Typescript library offers great developer ergonomics by using the builder pattern; ii) having our own embedding model allows us to streamline the experience. Instead of calling one API to embed your data and another API to store the embedding, you simply call Memora, pass in your data, and we handle the embedding and storage. That said, Memora is still in its early stages. Both the embedding model and the retrieval pipeline have room are far from perfect. However, we feel it's reached a point where it works for most usecases pretty well. To be honest, we see still some low-hanging fruits way to improve the models but we are advocates of launching early. We're thrilled to share Memora to y'all, we would love to hear any feedback or critiques you might have! |