|
|
|
|
|
by zozbot234
7 days ago
|
|
There's been some very rough experiments with batching on Apple Silicon (and that's not a highly suitable platform since the compute/thermals bottleneck hits sooner than elsewhere) that seem to be broadly consistent with what I argued, showing as much as 2x total decode throughput with an 8-wide batch. That's substantial in this context. |
|