| "trainable_params 12,810" laughs (for comparison, GPT3: 175,000,000,000 parameters) Can Apple's M1 help you train tiny toy examples with no real-world relevance? You bet it can! Plus it looks like they are comparing Apples to Oranges ;) This seems to be 16 bit precision on the M1 and 32 bit on the V100. So the M1-trained model will most likely yield worse or unusable results, due to lack of precision. And lastly, they are plainly testing against the wrong target. The V100 is great, but it is far from NVIDIA's flagship for training small low-precision models. At the FP16 that the M1 is using, the correct target would have been an RTX 3090 or the like, which has 35 TFLOPS. The V100 only gets 14 TFLOPS because it lacks the dedicated TensorRT accelerator hardware. So they compare the M1 against an NVIDIA model from 2017 that lacks the relevant hardware acceleration and, thus, is a whopping 60% slower than what people actually use for such training workloads. I'm sure my bicycle will also compare very favorably against a car that is lacking two wheels :p |