| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by petesergeant 335 days ago
	That's because PDFs are the hard part. If you're starting with small pieces of text, RAG becomes much much easier.

1 comments

imperfect_light 334 days ago

My question is less about PDFs and more about the notion that all the facts needed for the RAG are in documents. In my experience just a fraction of the questions that might be useful exist in a document somewhere. There must be a variation of RAGs that are pulling not from documents, but from databases using some semantic model.

link

petesergeant 334 days ago

Sure, but the process for this is laughably easy: you render the text with a minimum amount of text to place it into context, and submit that to whatever your embedding-maker is to get the embedding. You could potentially store the embedding in the same DB row if you have a DB that's happy with vector searches.

link