Hacker News new | ask | show | jobs
by drusepth 1256 days ago
Does M1/M2 really outperform CUDA on beefy ML GPUs in tasks like this? I'd love to see numbers if so; this seems extremely surprising.
2 comments

M1 performs better in some realtime use-cases because of the unified memory: the GPU and ML hardware can work on a camera framebuffer directly without any copy.

CUDA always requires sending data over the PCI bus, at least when it comes to realtime camera processing. GPUDirect exists but it's optimized for disks and NICs, I don't believe it's possible to use it with cameras.

No idea actually, I just find all sorts of odd benchmarks crop up for things where the Unified Memory Architecture on the M1/M2 give things surprising good performance due to the DMA transfer performance hit on other CPU/GPU combinations… it’s far from universal, but it’s just been surprising to see and this looked like the sort of thing that might be one of them between the camera decoding the ML & GPU processing and then “rendering” back out… where it might have had some benefits, hence my “wondering out loud”.