| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jimduk 2031 days ago
	Dumb question - if I want to do simple image processing on a pi4 (2d ffts, small kernels, summing 2d arrays in one dimension, finding Maxima), and I care about performance, is this a reasonable stack to use,with decent prospects or is it faster/safer to stick on the Arm, despite the GPU. 1k x1k monochrome images, at 3-10 fps ( or more)? Jetson nano seems to be the obvious commodity but pricier alternative with GPU access, but smaller ecosystem.

5 comments

Const-me 2031 days ago

An extremely rough estimate, FFT (prolly the most expensive one of what you mentioned) needs 5N*Log2(N) operations.

If you have 1M source floats and want 60 FPS, translates to only 6 GFlops.

On Pi4, on paper the GPU can do 32 GFlops. Again on paper, the CPU can do 8 FLOPs/cycle which translates (4 cores at 1.5 GHz) to 48 GFlops. That’s assuming you know what you’re doing, writing manually-vectorized C++ http://const.me/articles/simd/NEON.pdf abusing FMA, using OpenMP or similar for parallelism, and have a heat sink and ideally a fan.

So you’re probably good with both CPU and GPU. Personally, I would have started with NEON for that. C++ compilers have really good support for a decade now. These Vulkan drivers are brand new, and GLES 3.1 which added GPGPU is not much older, I would expect bugs in both compiler and runtime, these can get very expensive to workaround.

While I don’t have any experience with Jetson, on paper it’s awesome, with 472 GFlops. Despite the community is way smaller, nVidia is doing better job supplying libraries, CUDA toolkit has lots of good stuff, see e.g. cuFFT piece (I did use CUDA, cuFFT and other parts, just not on Jetson).

link

gary_0 2030 days ago

It still gives me a giggle that the flops numbers you're talking about were supercomputer-level when I was a kid, and now I can buy that kind of power with beer money and lose it in the back of a drawer.

link

Const-me 2030 days ago

On the other hand, it’s sad how we failed the software.

We have devices capable of many GFlops in our pockets and many TFlops on our desks, yet we pay by hours to use computers operated by companies like Amazon or Microsoft.

link

Lichtso 2031 days ago

Just try it out, says it works with Vulkan 1.0:

https://github.com/DTolm/VkFFT

link

vanderZwan 2031 days ago

If you're willing to experiment there's this thread on pixls.us[0] that might be interesting to follow:

> rustrated with heavy dependencies and slow libraries, i’ve been experimenting with some game technology to render raw image pipelines. in particular, i’m using SDL2 and vulkan. to spur some discussion, here is a random collection of bits you may find interesting or not.

> also please note this is just a rough prototype bashed together with very little care and lots of hardcoded things just to demonstrate what’s overall possible or not.

Since that is SDL2/Vulkan based you might get something out of the discussion there

[0] https://discuss.pixls.us/t/processing-that-sucks-less/13016

link

robert_foss 2031 days ago

Vulkan can be used for compute, although I would guess that few applications support that.

For pre Pi4 boards, there is an OpenCL implementation.

https://github.com/doe300/VC4CL

link

chpatrick 2031 days ago

I would start with whatever is simplest and use the GPU if you actually need to optimize performance.

link