Hacker News new | ask | show | jobs
by HanClinto 685 days ago
Semantic chunking. This is an intriguing idea.

I feel like one could do this with a chain of LLM prompts -- extract the primary subjects or topics from this long document, then prompt again (1 at a time?) to pull out everything related to each topic from the document and collate it into one semantic chunk.

At the very least, a dataset / benchmark centered around this task feels like it would be really useful.

1 comments

Yeah, I do think that's possible with LLM, just too slow and expensive to be usable in most settings.