Hacker News new | ask | show | jobs
by petesergeant 335 days ago
That's because PDFs are the hard part. If you're starting with small pieces of text, RAG becomes much much easier.
1 comments

My question is less about PDFs and more about the notion that all the facts needed for the RAG are in documents. In my experience just a fraction of the questions that might be useful exist in a document somewhere. There must be a variation of RAGs that are pulling not from documents, but from databases using some semantic model.
Sure, but the process for this is laughably easy: you render the text with a minimum amount of text to place it into context, and submit that to whatever your embedding-maker is to get the embedding. You could potentially store the embedding in the same DB row if you have a DB that's happy with vector searches.