M1 performs better in some realtime use-cases because of the unified memory: the GPU and ML hardware can work on a camera framebuffer directly without any copy.
CUDA always requires sending data over the PCI bus, at least when it comes to realtime camera processing. GPUDirect exists but it's optimized for disks and NICs, I don't believe it's possible to use it with cameras.
No idea actually, I just find all sorts of odd benchmarks crop up for things where the Unified Memory Architecture on the M1/M2 give things surprising good performance due to the DMA transfer performance hit on other CPU/GPU combinations… it’s far from universal, but it’s just been surprising to see and this looked like the sort of thing that might be one of them between the camera decoding the ML & GPU processing and then “rendering” back out… where it might have had some benefits, hence my “wondering out loud”.
CUDA always requires sending data over the PCI bus, at least when it comes to realtime camera processing. GPUDirect exists but it's optimized for disks and NICs, I don't believe it's possible to use it with cameras.