The number of chunks would be the same regardless of either approach.
The generation of questions can be done out-of-band by a cheaper model.
Their current implementation approach seems to require some computation per request. It would be a balance to see which strategy provides the most value.
The generation of questions can be done out-of-band by a cheaper model.
Their current implementation approach seems to require some computation per request. It would be a balance to see which strategy provides the most value.
The speed of responses overall would be faster.