Hacker News new | ask | show | jobs
by Tostino 690 days ago
I wish there was a similar model like this, but for (long context) text.

Would be extremely useful to be able to semantically "chunk" text for RAG applications compared to the generally naive strategies employed today.

If I somehow overlooked it, would be very interested in hearing about what you've seen.

1 comments

Semantic chunking. This is an intriguing idea.

I feel like one could do this with a chain of LLM prompts -- extract the primary subjects or topics from this long document, then prompt again (1 at a time?) to pull out everything related to each topic from the document and collate it into one semantic chunk.

At the very least, a dataset / benchmark centered around this task feels like it would be really useful.

Yeah, I do think that's possible with LLM, just too slow and expensive to be usable in most settings.