|
|
|
|
|
by bufo
1107 days ago
|
|
The neural engine has severe limitations at the moment. I tried using it for BERT about a year ago and kept crashing its API because of "out of memory" issues. The theoretical TOPs you mention also don't necessarily translate into usable TOPs because of memory bandwidth and caches. This is why for example the comparison of the M1 Max with a RTX 3090 was completely off. |
|
My daily work life includes a lot of model running on Apple hardware (Apple Silicon and A1# chips with the neural engine) using CoreML, often Pytorch models converted using coremltools. The performance of the Apple chips is spectacular if the intrinsics are supported (things obviously get dicier if there are currently unsupported ops). I mean, the memory bandwidth of the M2 Ultra is within spitting distance of the GDDR6X 4090.
People aren't going to be replacing H100 arrays with Apple Silicon and even as a fan I use nvidia hardware for training and convert the models to CoreML after the fact, but Apple clearly isn't just satisfied being some toy. They are continually climbing up that vine.