|
|
|
|
|
by joewferrara
983 days ago
|
|
This is a great article about the technical difficulties of building a RAG system at scale from an engineering perspective. Performance is about speed and compute. A topic that is not addressed is how to evaluate a RAG system where performance is about whether the RAG system is retrieving the correct context and answering questions accurately. A RAG system should be built so that the different parts (retriever, embedder, etc) can easily be taken out and modified to improve the performance of the RAG system at answering questions accurately. Whether a RAG system is answering questions accurately should be assessed during development and then continuously monitored. |
|
You are right. Retrieval accuracy is important as well. From an accuracy perspective, any tools you have found useful in helping validate retrieval accuracy?
In our current architecture, all the different pieces within the RAG ingestion pipeline are modifiable to be able to improve loading, chunking and embedding.
As part of our development process, we have started to enable other tools that we don't talk as much in the article about including a pre processing and embeddings playground (https://www.neum.ai/post/pre-processing-playground) to be able to test different combinations of modules against a piece of text. The idea being that you can establish you ideal pipeline / transformations that can then be scaled.