|
|
|
|
|
by EngineeringStuf
479 days ago
|
|
The number of chunks would be the same regardless of either approach. The generation of questions can be done out-of-band by a cheaper model. Their current implementation approach seems to require some computation per request. It would be a balance to see which strategy provides the most value. The speed of responses overall would be faster. |
|