| HN Mirror

It doesn't look like Cerebras mentioned the most important part, by trading model complexity due to using a vastly more capable system, they could could refactor that 600 line model effortlessly and rerun.

They can watch different layers train and find out how to optimize training or quantization, etc.

It feels like they kinda missed the forest for the trees here. The article should have focused on model architecture optimization due to the small LoC and the system having ridiculous training capacity.