Hacker News new | ask | show | jobs
by EngineeringStuf 479 days ago
The number of chunks would be the same regardless of either approach.

The generation of questions can be done out-of-band by a cheaper model.

Their current implementation approach seems to require some computation per request. It would be a balance to see which strategy provides the most value.

The speed of responses overall would be faster.