|
|
|
|
|
by wokkel
110 days ago
|
|
I read (but cannot find this anymore) that the information sent from layer to layer is minimal. The actual matrix work happens within a layer. They are not doing matrix multiplication over the netwerk (that would be insane latency wise). |
|
Yes the latency hurts performance, that why it’s only achieving ~8tok/s.