|
|
|
|
|
by jimduk
2031 days ago
|
|
Dumb question - if I want to do simple image processing on a pi4 (2d ffts, small kernels, summing 2d arrays in one dimension, finding Maxima), and I care about performance, is this a reasonable stack to use,with decent prospects or is it faster/safer to stick on the Arm, despite the GPU. 1k x1k monochrome images, at 3-10 fps ( or more)?
Jetson nano seems to be the obvious commodity but pricier alternative with GPU access, but smaller ecosystem. |
|
If you have 1M source floats and want 60 FPS, translates to only 6 GFlops.
On Pi4, on paper the GPU can do 32 GFlops. Again on paper, the CPU can do 8 FLOPs/cycle which translates (4 cores at 1.5 GHz) to 48 GFlops. That’s assuming you know what you’re doing, writing manually-vectorized C++ http://const.me/articles/simd/NEON.pdf abusing FMA, using OpenMP or similar for parallelism, and have a heat sink and ideally a fan.
So you’re probably good with both CPU and GPU. Personally, I would have started with NEON for that. C++ compilers have really good support for a decade now. These Vulkan drivers are brand new, and GLES 3.1 which added GPGPU is not much older, I would expect bugs in both compiler and runtime, these can get very expensive to workaround.
While I don’t have any experience with Jetson, on paper it’s awesome, with 472 GFlops. Despite the community is way smaller, nVidia is doing better job supplying libraries, CUDA toolkit has lots of good stuff, see e.g. cuFFT piece (I did use CUDA, cuFFT and other parts, just not on Jetson).