| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blt 4497 days ago

I saw their talk at Siggraph in 2012. This is really interesting work.

The target image processing algorithms are low level. Think "How can I write the fastest 3x3 box blur for Sandy Bridge and later architectures", not "How can I speed up my face detector".

Examples of scheduling decisions that Halide deals with: process image in 4x4? 8x8? 16x16 chunks? use a temporary buffer for an intermediate processing stage, causing worse cache behavior but reusing results that are needed more than once? use a sliding window the size of a couple rows as a compromise?

This kind of work means the difference between a laggy or interactive Gaussian Blur radius slider in Photoshop.

The Halide code shown at the talk was replacing really hard-to-read code with lots of SIMD compiler intrinsics. Dozens of lines of code doing something that would be 5 lines in a naive implementation. With Halide, it's almost as readable as the naive version because the scheduling stuff is separated from the algorithm.

For an application like Photoshop, this is a big win because they will never choose code readability over performance. Performance wins every time. If they can get the same performance with readable code, they are very happy.

GPU code generation falls naturally out of the scheduling intelligence and restricted problem domain.

I have never used Halide, so I do not intend to endorse it, but this line of inquiry is absolutely useful for a certain niche of programming.