Hacker News new | ask | show | jobs
The CLaRa-7B models unify RAG and provide built-in semantic doc compression (huggingface.co)
1 points by anactofgod 187 days ago
1 comments

CLaRa (Continuous Latent Reasoning) is an approach Retrieval-Augmented Generation (RAG) that shortens context, reduces double-encoding, and improves quality of responses by compressing documents into a small set of "continuous memory tokens" that preserve the key information in the documents, and optimizes and performs retrieval and generation out of that shared latent space.