Hacker News new | ask | show | jobs
by ekianjo 477 days ago
> time and generate possible questions from the LLM based on the context of the current semantically split chunk.

Possible but very compute intensive. Imagine if you have hundreds of thousands of chunks...

1 comments

The number of chunks would be the same regardless of either approach.

The generation of questions can be done out-of-band by a cheaper model.

Their current implementation approach seems to require some computation per request. It would be a balance to see which strategy provides the most value.

The speed of responses overall would be faster.