| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lmeyerov 472 days ago

Yeah exactly

We still want chunking in practice to avoid LLM confusion, undifferentiated embeddings, and handling large datasets at lower cost + large volumes. Large context means we can now tolerate multi-paragraph/page, so more like chunk by coherent section.

In theory we can do entire chapter/book, but those other concerns come in, so I only see more niche tools or talk-to-your-PDF do that.

At the same time, embedding is often a significant cost in above scenarios, so I'm curious about the semantic chunking overheads..