| HN Mirror

The M1 Max has an unusably slow GPU for inference. TTFT on real-world contexts can be over 10 minutes.

> Nothing new here, apart from being able to use CUDA on a less power hungry system.

CUDA has been running on ARM SOCs since the Tegra K1, 12 years ago. Nvidia is not new to ARM, nor is CUDA.