Hacker News new | ask | show | jobs
by PresidentZippy 837 days ago
Not to crap on someone's hard work, but usually reinventing something is only warranted if you can do it >2x better than the best available solution. Whether that means requiring half as much hardware to do the same job or requiring half as much manpower to use, usually it's a waste of time and money to make something that's only marginally better at best.

How much benefit did Intel expect to reap from rewriting an LLVM x64 compiler backend from the ground up?

2 comments

The Intel compiler has a long history, at least going back to 2003.[1] At some point they realized that instead of maintaining their own frontend they can just use LLVM. So they ported their optimizers and backend over.

And on their own hardware the compiler is often significantly better.[2] This was part of their competitive advantage.

2x better is also a lot. It means that you only have to buy half as many servers. Even much smaller improvements are often worthwhile.

[1]: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Release... [2]: https://www.intel.com/content/www/us/en/developer/tools/onea...

> 2x better is also a lot.

I specifically drew the line at 2x.

>It means that you only have to buy half as many servers.

That's what I said in my initial comment: "Whether that means requiring half as much hardware to do the same job or requiring half as much manpower to use..."

But even if I didn't already point that out, I'm curious how you would react in your everyday life if all the people you interact with were as insufferably pedantic as you.

Let's look at a specific example: The Aurora Argonne National Laboratory.It has an estimated cost of US$500 million and a power draw of 24.6 MW.[1] Even 1% less hardware means saving several million dollars. And that is just a single system.

[1]: https://en.wikipedia.org/wiki/Aurora_(supercomputer)

Isn't this a GPU backend, not an x64 one?
The backend builds both x86_64 machine code, and GPU kernels/FPGA microcode according to their documentation, but that just begs another question:

How is this significantly better than the existing tools for compiling C into GPU and FPGA code?

It would help if Intel published some kind of chalk talk with a live demo showing how much faster you can build HPC applications using their new toolkit.

I'm not summarily writing it off, but I need a little convincing before I put 20+ hours into trying it out myself.

Yes, though the llvm x64 backend may turn out to be very similar to the upstream one. Existing tools for compiling C to Intel's GPU are this tool, there are no others.
Is there anything more to it than that? If not, the documentation would be a lot more helpful if it lead with something straight to the point. Here's something that could go directly under the title of the README:

"This compiler consists of a custom LLVM frontend and backend. The backend compiles LLVM IR code into machine code consisting of x86_64 instructions and Intel GPU code. The frontend works in conjunction with the backend to compile C and C++ code with special optimization which when enabled, compiles OpenMP routines into hardware-accelerated code targeting Intel GPUs, FPGAs, or AMD and NVIDIA GPUs."

As someone who only used OpenMP academically, I don't see much of a point in that. In the post C++11 world, where we can write type-safe compile-time code, preprocessor macro definitions should stay in C code.

Until Intel GPUs are at least competitive with the big boys, interop with their products doesn't concern me a whole hell of a lot. I'm not going to plan my scientific computing applications around the integrated graphics found on cheap Wintel consumer devices.

The docs say it's a proprietary compiler for intel hardware. I'm inclined to believe it on that.

It's worth noting that OpenMP pragmas are a totally different thing to C preprocessor macros. A pragma like 'omp target parallel for' means something like "take the following loop, build a GPU kernel out of it, arrange for data to be copied back and forth and to launch that kernel when control flow gets here, and arrange to link in all the openmp libraries and also run a bunch of compiler optimisations". A macro means "replace these tokens with these other ones".

OpenMP is essentially a really big runtime library dealing with threads, scheduling execution, running code on GPUs and so forth. It is sort-of usable in that form. If you're determined then making calls directly into libomp.so and libomptarget.so will make your will a reality. All the pragma syntax is about transforming application code into a lot of calls into that library with appropriately constructed tables of data. And then the compiler works hard to optimise this, e.g. removing calls that don't need to happen, simplifying others, deduplicating some.

Syntactically OpenMP is a really good fit for Fortran. The invocations look completely appropriate there. For C++, it does tend to upset the sensibilities of the programmers. I personally think it's wildly funny for people who are content with the syntactic horror show of C++ to decide the OpenMP extensions are ugly but there we go, normalisation of deviance and all that.

On a more philosophical level, and what drew me to implementing OpenMP originally, CUDA is a problem. Not only in the vendor lock to nvidia sense - it's also a deeply nasty language to program with. I especially dislike the warp intrinsics - they take a bitmap corresponding to the CFG of your program, which you are supposed to compute manually (across branches, loops and so forth) and pass around into library functions. GPUs are excellent machines and I want to be able to program them in something which is not CUDA.

Ok, now I'm learnding some interesting and/or valuable shit.

I'm familiar with compiler intrinisics (e.g. __sync_add_and_fetch), but I just assumed (incorrectly) that "#pragma omp_parallel_for" was just a macro that adds pthread API calls into a for loop to create new threads and join when finished.

>Syntactically OpenMP is a really good fit for Fortran.

I can get on board with Fortran for the niche of scientific computing, although again my qualms are with using it in C++. Too many people say they know C++, but then write "C++" code with raw pointers. I don't use C++17 for performance, and most "zero-cost abstractions" are a lie; I use it for type safety. If you buy into the modern C++ way, you'll catch a lot of stuff at compile time that systems programmers using C and web devs using a litany of other weakly or dynamically-typed languages catch in their production environment.

>syntactic horror show of C++

Other than "string" not being a native type, I'd reckon what you really hate is not the syntax itself, but the compiler errors. Granted, if you hire the kind of people who post on Stack Overflow, you can get wacky shit like this:

template<typename Testicle, typename... Diseases> static std::optional<std::tuple<Diseases...>> deeply::nested::namespaced::classes::suck_balls(const Testicle& left_nut, Testicle&& right_nut) noexcept; // Is "classes" a class or another namespace?

But I've had to fix other peoples' spaghetti code in 4 other languages, so I stopped blaming the language many moons ago.

>what drew me to implementing OpenMP originally, CUDA is a problem

Did you try Vulkan Compute, and if so what problems did you run into? 200+ lines of "setup" code, similar to OpenCL programming?

I ask because the entirety of my systems programming career was not speeding up number crunching, but reducing IPC and making things run asynchronously.