|
|
|
|
|
by Tostino
690 days ago
|
|
I wish there was a similar model like this, but for (long context) text. Would be extremely useful to be able to semantically "chunk" text for RAG applications compared to the generally naive strategies employed today. If I somehow overlooked it, would be very interested in hearing about what you've seen. |
|
I feel like one could do this with a chain of LLM prompts -- extract the primary subjects or topics from this long document, then prompt again (1 at a time?) to pull out everything related to each topic from the document and collate it into one semantic chunk.
At the very least, a dataset / benchmark centered around this task feels like it would be really useful.