| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imperfect_light 338 days ago
	The emphasis on PDFs for RAG seems like something out of the 1990s. Are there any good frameworks for using RAG if your company doesn't go around creating documents left and right? After all, the documents/emails/presentations will cover the most common use cases. But we have databases that have all the questions the RAG might be asked, far more answers than that which live in documents.

1 comments

petesergeant 338 days ago

That's because PDFs are the hard part. If you're starting with small pieces of text, RAG becomes much much easier.

link

imperfect_light 337 days ago

My question is less about PDFs and more about the notion that all the facts needed for the RAG are in documents. In my experience just a fraction of the questions that might be useful exist in a document somewhere. There must be a variation of RAGs that are pulling not from documents, but from databases using some semantic model.

link

petesergeant 337 days ago

Sure, but the process for this is laughably easy: you render the text with a minimum amount of text to place it into context, and submit that to whatever your embedding-maker is to get the embedding. You could potentially store the embedding in the same DB row if you have a DB that's happy with vector searches.

link