| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wufufufu 1178 days ago
	Oh so there is pre-processing to find the useful portions? What are you using for the pre-processing? I feel that it's inevitable that OpenAI et al. will be able to handle large PDF documents eventually. But until then I'm sure there's a lot of value of in this kind of pre-processing/chunking.

1 comments

naveedjanmo 1178 days ago

Yeah I think you're right - the 32k context window for GPT-4 (not available for everyone yet) is already enough for research papers. I'm using a library called Langchain, there's also LlamaIndex.

link