|
|
|
|
|
by mirekrusin
1068 days ago
|
|
The claim is parallelism for training which is not fixed speed up, different complexity for inference (constant time), and different complexity for large context inference (linear) - so nothing that can be summarised as 8x - or am I getting this summary wrong? |
|