|
|
|
|
|
by froh
6 hours ago
|
|
> GPUs are extremely underutilized if you launch just 1 generation stream why is that? b/c the thing is waiting for the hoooman and idling? or some parallelizable interleaving steps? I have no intuition yet how this works under the hood. |
|
Waiting for the hooman (or tool calls) won't help either, of course.