|
|
|
|
|
by menaerus
602 days ago
|
|
It's maybe because the assumption about low latency because everything fits in SRAM is not valid? CS-1 had 18G of SRAM, CS-2 extended it to 40G and CS-3 has 44G of SRAM. None of these is sufficient to run the inference of Llama 70B and much less so of even larger models. |
|