|
|
|
|
|
by sansseriff
396 days ago
|
|
It would be great to semantically search through literature with embeddings. At least one person I know if is trying to generate a vector database of all arxiv papers. The big problem I see is attribution and citations. An embedding is just a vector. It doesn't contain any citation back to the source material or modification date or certificate of authenticity. So when using embeddings in RAG, they only serve to link back to a particular page of source material. Using embeddings as links doesn't dramatically change the way citation and attribution are handled in technical writing. You still end up citing a whole paper or a page of a paper. I think GraphRAG [1] is a more useful thing to build on for technical literature. There's ways to use graphs to cite a particular concept of a particular page of an academic paper. And for the 'citations' to act as bidirectional links between new and old scientific discourse. But I digress [1] https://microsoft.github.io/graphrag/ |
|