|
|
|
|
|
by alecco
621 days ago
|
|
If Cerebras keeps improving it will be a decent contender to Nvidia. Nvidia VRAM-SRAM is a bottleneck. For just inference, it needs to download a model at least once per token (divided by batch size). The bottleneck is not Tensor Cores but memory transfers. They say it themselves. Cerebras fixes that (at a cost of software complexity and narrower target solution). |
|