Hacker News new | ask | show | jobs
by blakesterz 753 days ago
Maybe a dumb question, but I think anyone reading this question would know a good answer for me. If I have a big pile of PDFs and wanted to get an LLM to be really good at answering questions about what's in all those PDFs, would it be best for me to try running this locally? "Best" in this case would be I would want to get the best/smartest answers from my questions about these PDFs. They're all full-text PDFs, studies and results on a specific genetic condition that I'd like to understand better by asking something smart questions.
3 comments

LlamaIndex can make this task possible in a very few (surprisingly few) lines of code: https://docs.llamaindex.ai/en/stable/understanding/putting_i...

You'll likely want to move beyond the first examples so you can choose models & methods. Either way, LI has tons of great documentation and was originally built for this purpose. They also have a commercial Parsing product with very generous free quotas (last I checked)

If its just for you, may I suggest Open AI's python notebook examples. This was the one I used to get started.

https://cookbook.openai.com/examples/parse_pdf_docs_for_rag

There are several other examples like this .. but I got stuck in jargon of Langchain or LlamaIndex etc..

Not self hosted, but Google Notebook LLM is OK at that: https://notebooklm.google.com/

You can also upload files to ChatGPT and ask questions about it.