Hacker News new | ask | show | jobs
by alecco 621 days ago
If Cerebras keeps improving it will be a decent contender to Nvidia. Nvidia VRAM-SRAM is a bottleneck. For just inference, it needs to download a model at least once per token (divided by batch size). The bottleneck is not Tensor Cores but memory transfers. They say it themselves. Cerebras fixes that (at a cost of software complexity and narrower target solution).