| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by HanClinto 685 days ago

Semantic chunking. This is an intriguing idea.

I feel like one could do this with a chain of LLM prompts -- extract the primary subjects or topics from this long document, then prompt again (1 at a time?) to pull out everything related to each topic from the document and collate it into one semantic chunk.

At the very least, a dataset / benchmark centered around this task feels like it would be really useful.

1 comments

Tostino 685 days ago

Yeah, I do think that's possible with LLM, just too slow and expensive to be usable in most settings.

link