| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by blakesterz 800 days ago
	Maybe a dumb question, but I think anyone reading this question would know a good answer for me. If I have a big pile of PDFs and wanted to get an LLM to be really good at answering questions about what's in all those PDFs, would it be best for me to try running this locally? "Best" in this case would be I would want to get the best/smartest answers from my questions about these PDFs. They're all full-text PDFs, studies and results on a specific genetic condition that I'd like to understand better by asking something smart questions.

3 comments

verdverm 800 days ago

LlamaIndex can make this task possible in a very few (surprisingly few) lines of code: https://docs.llamaindex.ai/en/stable/understanding/putting_i...

You'll likely want to move beyond the first examples so you can choose models & methods. Either way, LI has tons of great documentation and was originally built for this purpose. They also have a commercial Parsing product with very generous free quotas (last I checked)

link

manishsharan 800 days ago

If its just for you, may I suggest Open AI's python notebook examples. This was the one I used to get started.

https://cookbook.openai.com/examples/parse_pdf_docs_for_rag

There are several other examples like this .. but I got stuck in jargon of Langchain or LlamaIndex etc..

link

solardev 800 days ago

Not self hosted, but Google Notebook LLM is OK at that: https://notebooklm.google.com/

You can also upload files to ChatGPT and ask questions about it.

link