Hacker News new | ask | show | jobs
by wufufufu 1178 days ago
Oh so there is pre-processing to find the useful portions? What are you using for the pre-processing?

I feel that it's inevitable that OpenAI et al. will be able to handle large PDF documents eventually. But until then I'm sure there's a lot of value of in this kind of pre-processing/chunking.

1 comments

Yeah I think you're right - the 32k context window for GPT-4 (not available for everyone yet) is already enough for research papers. I'm using a library called Langchain, there's also LlamaIndex.