Hacker News new | ask | show | jobs
by linhat 4450 days ago
As a computer vision researcher, this looks very interesting, although somehow I have yet to understand how they want to generalize highly complicated optimization patterns (access order, locality, pipelines, platform limitations ...), especially since some algorithms (other than the shown blur filter) require quite complicated access patterns on the image data and can only be hand optimized most of the time (that doesn't imply that they would not benefit from general optimization at all, just that they might be way faster when hand optimized). Still, if Halide produces faster code for some cases (e.g. filter operations amongst others), it will still be worth its salt.
2 comments

I went to one of the SIGGRAPH talks they did a couple years ago.

The theoretical plan is basically to write an optimizer that could intuit good schedules, recognizing that they actually don't have a good idea of how to do this. I think they can currently run some ML algorithms that churn through a bunch of different schedules and find the one most fit for a problem but it's rather brute force and slow at the moment.

That said, the conceptual distinction of separating the algorithm from the scheduling also allows you to tune scheduling by hand much easier than would be possible otherwise.

Since this is a language and a compiler, my guess would be the answer to your question is: the compiler will optimize for the underlying platform. The whole point of Halide is stated in their abstract: "... make it easier to write high-performance image processing code" which is the exact opposite of "hand optimization". Halide allows developers to express what to do in a powerful, domain specific language - the compiler takes care of the "how".

This approach makes a lot of sense: abstract the annoying low level architecture details. They have a lot of targets which is fantastic: x86/SSE, ARM v7/NEON, CUDA, Native Client, and OpenCL. Let the architecture specialist worry about the architecture specifics. The disadvantage: the achieved performance then depends on quality and wisdom of the compiler. But once certain things are optimized for a specific architecture, every user will benefit.

How they do it on the compiler end of things, I'm not sure. There are a number of techniques. Among the simpler is auto-tuning. There is also a new term: "copious-parallelism" [0]. It acknowledges that to achieve performance portability across platforms, algorithms must offer explicit ways of parametrization and tuning to adapt to different platforms. I think this is the right concept but believe that it could be implemented within the compiler. The domain specialist should not have to think about those things.

[0] http://www.hpcwire.com/2014/01/09/future-accelerator-program...

The paper was specifically about hand doing your schedules, not the compiler doing them automagically, which is a huge pipe dream at any rate. For the programs in the class, you are looking at a few orders of magnitude in perf differences for different schedules, which is why the programmer needs control so they can be guaranteed the performance they are expecting. Compilers only reliably optimize at the few percentage point nice to have level.
You're right, I completely misunderstood the purpose of Halide. I read up on it and I see now that they simplify how developers can do the copious implementation by hand. The schedule must be specified by the developer, the compiler doesn't to that.