Hacker News new | ask | show | jobs
by jsyolo 1189 days ago
Can GPT be used for something like feeding it a 2000 page PDF of pure text(not english) and ask questions about its contents?
4 comments

Not yet, but my bet is that we will be able to in the near future. The gpt-4-32k (with a 32K context window) allows for about 52 pages of text
This supports up to 2000 pages: https://www.chatpdf.com/
No. In the near future they will support ~50 pages.
Not without other tooling. Things like langchain and llama_index would be good starting points. An approach would be to use llama_index to create embedding vectors for each section of the pdf, then you query and it gets a vector for your query -> gets the context -> puts it into gpt + your query -> returns the result.

I've seen people say it's better to ask gpt for a fake answer then use the embedding of that answer to search (so you're looking for context that looks like the answer). I don't know if that's supported in those tools.