| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yawnxyz 1172 days ago
	it's using "overlapping chunking" methods and it usually works for generic PDFs. It really falls apart on technical documents, SOPs and research articles where you need to get context from chunks way above. Using vector DBs also doesn't work well bc you have to twiddle around with window size / overlappy-ness, which changes depending on what kind of paper you're uploading. It's a mess and takes too long