Hacker News new | ask | show | jobs
by drothlis 2015 days ago
I believe many (most?) opencv & numpy operations release the GIL.

Good point about OpenCV's in-place operations. Sometimes it's tricky/impossible to do that in numpy if you need to implement something that OpenCV doesn't provide. For example the C code that I linked in my previous comment, we wrote as an optimization of OpenCV's `matchTemplate` when the inputs meet a specific condition (that both input images are the same size). In C we do the multiplications as we iterate over the images and we maintain a rolling sum in a single variable. In numpy you can't really do this, you have to multiply the whole array and then sum the result.

For 720p images our C implementation[1] was 100x faster than our numpy implementation[2], and 10x faster than numba[3].

[1]: https://github.com/stb-tester/stb-tester/blob/v32/_stbt/sqdi...

[2]: https://github.com/stb-tester/stb-tester/pull/566/files#diff...

[3]: https://github.com/stb-tester/stb-tester/pull/566/files#diff...

1 comments

That's a nice and tidy codebase, real pleasure to read.

> I believe many (most?) opencv & numpy operations release the GIL.

Any idea how I can determine this? I am prototyping a real time machine vision application targeting 2x720p@240fps and I want to avoid writing any C++ for as long as possible.

I don't know much about Python's C API, but this is the line in the OpenCV Python bindings that drops the GIL: https://github.com/opencv/opencv/blob/4.5.0/modules/python/s...

(see https://docs.python.org/3/c-api/init.html#releasing-the-gil-... in the Python C API manual).

That's only called from the ERRWRAP2 macro here: https://github.com/opencv/opencv/blob/4.5.0/modules/python/s...

That macro, in turn, is called from gen_template_func_body in gen2.py: https://github.com/opencv/opencv/blob/4.5.0/modules/python/s...

And that seems to be a code generator that is generating the bindings for all the OpenCV functions: https://github.com/opencv/opencv/blob/4.5.0/modules/python/s...

As to testing this, on Linux I'd run `atop` to see if your program is using all available CPUs.

Nice spelunking! If there is hope of avoiding C++ altogether, then I'll try testing it again on something beefier than my laptop. Thanks for taking the time and effort.