Hacker News new | ask | show | jobs
by bigyabai 18 days ago
The M1 Max has an unusably slow GPU for inference. TTFT on real-world contexts can be over 10 minutes.

> Nothing new here, apart from being able to use CUDA on a less power hungry system.

CUDA has been running on ARM SOCs since the Tegra K1, 12 years ago. Nvidia is not new to ARM, nor is CUDA.