|
|
|
|
|
by oneofthose
4450 days ago
|
|
Since this is a language and a compiler, my guess would be the answer to your question is: the compiler will optimize for the underlying platform. The whole point of Halide is stated in their abstract: "... make it easier to write high-performance image processing code" which is the exact opposite of "hand optimization". Halide allows developers to express what to do in a powerful, domain specific language - the compiler takes care of the "how". This approach makes a lot of sense: abstract the annoying low level architecture details. They have a lot of targets which is fantastic: x86/SSE, ARM v7/NEON, CUDA, Native Client, and OpenCL. Let the architecture specialist worry about the architecture specifics. The disadvantage: the achieved performance then depends on quality and wisdom of the compiler. But once certain things are optimized for a specific architecture, every user will benefit. How they do it on the compiler end of things, I'm not sure. There are a number of techniques. Among the simpler is auto-tuning. There is also a new term: "copious-parallelism" [0]. It acknowledges that to achieve performance portability across platforms, algorithms must offer explicit ways of parametrization and tuning to adapt to different platforms. I think this is the right concept but believe that it could be implemented within the compiler. The domain specialist should not have to think about those things. [0] http://www.hpcwire.com/2014/01/09/future-accelerator-program... |
|