Hacker News new | ask | show | jobs
by Kon-Peki 1538 days ago
This has been discussed on HN before.

I don't condone Intel behavior, but let's be honest here: AMD underinvests in software and expects others to pick up the slack. That isn't acceptable.

8 comments

What should they have done instead? Built a compiler with a "cripple Intel" function? So people would have to download the executable that's fastest on their CPU, even though they use the same instruction set?

The issue here is that they used a slower code path even on CPUs that could run the faster one, just because they were made by a competitor.

You say "AMD should have made their own compiler", but why? What else should they have made? An OS? An office suite? Why?

AMD should concentrate on making LLVM and GCC work great on AMD processors, by contributing the needed code. They are already making some contributions but could be doing more, and they could be funding experts to work on that and giving those experts the information they need.
But they already do this. AMD is one of the largest corporate contributors to LLVM and GCC.

It's Intel that tends to phone this in and make everyone else pick up the slack.

Per https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Rec..., Intel actually contributes (slightly) more to LLVM than AMD does.
They do. Actually their own (LLVM based) compilers are about as fast as GCC and LLVM

https://www.phoronix.com/scan.php?page=article&item=aocc32-c...

I don't know if it necessarily says much that their LLVM-based compiler is about as fast as LLVM.
It says they might as well stop dividing their effort and focus on the upstream LLVM (or alternatively treat their own version as just a development branch that they can push upstream from). While they have expertise on the details of their processor they may benefit more by cooperating with all the compiler experts outside their company.
To fix this problem AMD would have to work on making LLVM and GCC work great on Intel processors. That would be the only way to make people not use the Intel compiler for extra performance and ending up with binaries that are crippled for AMD. Clearly that's not a solution for this problem.
AMD's software offerings (e.g. look at uProf vs vTune) are functional at best. Intel's are much easier to use, have a lot more documentation, and actually make your life easier versus having basically just a firehose of data.
Very likely, this was not done intentionally.

I think we can simply imagine a common scenario: some employee working for Company X, developing a compiler suite, and adding necessary optimizations for Company X's processors. Meanwhile, Company Y's processors don't get as much focus (perhaps due to the employee not knowing about Company Y's CPUIDs, supported optimizations for different models, etc.). Thus, Company Y's processors don't run as quickly with this particular library.

Why does this have to be malicious intent? Surely it's not surprising to you that Company X's software executes quicker on Company X's processors: I should hope that it does! The same would hold true if Company Y were to develop a compiler; unique features of their processors (and perhaps not Company X's) should be used to their fullest extent.

No, this was definitely intentional. Intel is doing extra work to gate features on the manufacturer ID when there are feature bits which exist specifically to signal support for those features (and these bits were defined by Intel themselves!).

If they had fixed the issue shortly after it was publicly disclosed it might have been unintentional, but this issue has been notorious for over a decade and they still refuse to remove the unnecessary checks. They know what they're doing.

The thing is: the bits to check for SSE, SSE2, ..., AVX, AVX2, AVX-512? They're in the same spot on Intel and AMD CPUs. So you don't need to switch based on manufacturer. The fact that they force a `GenuineIntel` check makes it seems malicious to many.
All browsers pretend to be MSIE (and all compilers pretend to be GCC). You'd think AMD would make it trivial to change the vendor ID string to GenuineIntel for "compatibility".
Thats not how these CPUs work.

The CPUID instruction allows software to query the CPU on if an instruction set is supported. Code emitted by Intel's compiler would only query if the instruction set exists if the CPU is from Intel, instead of just always detecting.

AMD can choose to to implement (or not) any instruction set that Intel specifies, and Intel can choose to implement (or not) any instruction set AMD specifies, however, it would in 100% of cases be wrong to check who made the CPU instead of checking the implemented instruction set. AMD implements MMX, SSE1-4, AVX1 and 2. Any software compatible with these must work on AMD CPUs that also implement these instructions.

If AMD ever chooses to sue Intel over this (likely as a Sherman Act violation, same as the 2005 case), a court would likely side with AMD due to the aforementioned previous case: Intel has an established history of violating the law to further its own business interests.

I’m with you generally, but having written some code targeting these instructions from a disinterested third-party perspective, there are big enough differences in some instructions in performance or even behavior that can sincerely drive you to inspect the particular CPU model and not just the cpuid bits offered.

Off the top of my head, SSSE3 has a very flexible instruction to permute the 16 bytes of one xmm register at byte granularity using each byte of another xmm register to control the permutation. On many chips this is extremely cheap (eg 1 cycle) and its flexibility suggests certain algorithms that completely tank performance on other machines, eg old mobile x86 chips where it runs in microcode and takes dozens or maybe even hundreds of cycles to retire. There the best solution is to use a sequence of instructions instead of that single permute instruction, often only two or three depending on what you’re up to. And you could certainly just use that replacement sequence everywhere, but if you want the best performance _everywhere_, you need to not only look for that SSSE3 bit but also somehow decide if that permute is fast so you can use it when it is.

Much more seriously, Intel and AMD’s instructions sometimes behave differently, within specification. The approximate reciprocal and reciprocal square root instructions are specified loosely enough that they can deliver significantly different results, to the point where an algorithm tuned on Intel to function perfectly might have some intermediate value from one of these approximate instructions end up with a slightly different value on AMD, and before you know it you end up with a number slightly less than zero where you expect zero, a NaN, square root of a negative number, etc. And this sort of slight variation can easily lead to a user-visible bug, a crash, or even an exploitable bug, like a buffer under/overflow. Even exhaustively tested code can fail if it runs on a chip that’s not what you exhaustively tested on. Again, you might just decide to not use these loosely-specified instructions (which I entirely support) but if you’re shooting for the absolute maximum performance, you’ll find yourself tuning the constants of your algorithms up or down a few ulps depending on the particular CPU manufacturer or model.

I’ve even discovered problems when using the high-level C intrinsics that correspond to these instructions across CPUs from the same manufacturer (Intel). AVX512 provided new versions of these approximations with increased precision, the instruction variants with a “14” in their mnemonic. If using intrinsics, instruction selection is up to your compiler, and you might find compiling a piece of code targeting AVX2 picks the old low precision version, while the compiler helpfully picks the new increased-precision instructions when targeting AVX-512. This leads to the same sorts of problems described in the previous paragraph.

I really wish you could just read cpuid, and for the most part you’re right that it’s the best practice, but for absolutely maximum performance from this sort of code, sometimes you need more information, both for speed and safety. I know this was long-winded, and again, I entirely understand your argument and almost totally agree, but it’s not 100%, more like 100-epsilon%, where that epsilon itself is sadly manufacturer-dependent.

(I have never worked for Intel or AMD. I have been both delighted and disappointed by chips from both of them.)

I don't think you read the article. Go read it first before you make your hypothesis. If it was as easy to fix as using a environment variable (which no longer works) then it was done intentionally.
https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article.

OK
I don't think the fact that it can be enabled/disabled by environmental variable indicates malicious intent. It could be as simple as that Intel doesn't care to test there compiler optimizations on competitors' CPU's. If have to distribute two types of binaries (one which were optimized but could break, vs un-optimized and unlikely to break), I would default over to distributing the un-optimized version. Slow is better than broken.

I understand some end users may not be able to re-compile the application for there machines, but I wouldn't say its Intel's fault, but rather the distributors of that particular application. For example, if AMD users want Solidworks to run faster on their system, they should ask Dassault Systemes for AMD-optimized binaries, not the upstream compiler developers!

Anyways, for those compiling their own code, why would anyone expect an Intel compiler to produce equally optimized code for an AMD cpu? Just use gcc/clang or whatever AMD recommends.

The thing that gets me about Intel's culture, as someone who worked there, was that Intel as an organisation was completely unable to actually accept they'd done anything wrong. Ever.

There are lots of cases where Intel has either screwed up or done things that were unarguably anti-competitive. It happens at every company, I don't like Uber, but I'm not going to blame Uber today for the fuckery that Kalanick got up to.

In each case you could ask the Intel HR, or Intel senior management what they thought about it and it was never Intel's fault. The answers to any questions about this sort of stuff would be full of pettifogging, passsive voice, and legalese. The result was the internal culture was an extremely low trust environment since you knew people were willing to be transparantly intellectually dishonest to further their careers. I haven't been there since Gelsinger arrived but I hope that changes, I wonder how much it can change in the legal environment we're in.

I don't think this is dishonesty - it's auteur mentality. In Intel's view, AMD was a second-source vendor that went rogue, and gets to free-ride on their patents because Intel couldn't be arsed to extend x86 to 64-bit. If they had their way, they'd own the x86 ISA interface and all their competition would be incompatible architectures that you have to recompile for. Crippling AMD processors with their C compiler wasn't dishonest, it was DRM to protect their """intellectual property"""[0].

Gelsinger was the head designer on the 486, so he was around during the time when Intel was obsessed with keeping competition out of their ISA and probably has a case of auteur mentality, too.

[0] In case you couldn't tell, I really hate this word. The underlying concepts are, at best, necessary evils.

> In Intel's view, AMD was a second-source vendor that went rogue, and gets to free-ride

In Russia's view Ukraine is a breakaway province that was getting a free ride from all the transit gas fees and must be brought to heel.

In medieval mentality women are property

What i mean is, its really a tool to justify why your dishonest behaviour is okay

I think it's great if a hardware company leaves the software for others. This leads to open specifications.
At the firmware / driver level, fully open specifications for high performance hardware is an impossible dream.

At best, detailed documentation is a lower priority item below "make it work" and "increase performance".

At worst, it requires exposing trade secrets.

Edit: It'd probably be more productive for everyone if we set incentives and work such that the goal we want (compilers that produce code that runs optimally on Intel, AMD, and other architectures) isn't contingent on Intel writing them for non-Intel architectures. (Said somewhat curmudgeonly, because everyone complains about things like this, but also doesn't really how insanely hard and frustratingly edge-case-ridden compiler work is)

If Intel did that there probably wouldn't be a software suite at all for their processors.

Compare to vTune just about all open source profilers are either a bad joke or like programming in Basic in a C++ age.

No, just don't falsely market your product as fair or neutral.
It’s the Intel MKL, I don’t think Intel has ever even endorsed using it on other vendors CPUs, much less claimed that it is “fair” or “neutral”.
Well:

    On November 12, 2009 AMD and Intel Corporation announced a comprehensive settlement agreement to end all outstanding legal disputes between the companies, including antitrust and patent cross license disputes. In addition to a payment of $1.25B that Intel made to AMD, Intel agreed to abide by an important set of ground rules that continue in effect until November 11, 2019. 

    Customers and Partners

    With respect to customers and partners, Intel must not:*
    [...]
    Intentionally include design/engineering elements in its products that artificially impair the performance of any AMD microprocessor.
https://www.amd.com/en/corporate/antitrust-ruling

I like that 'in effect until November 11, 2019.' part :D

AMD[1], NVidia[2] do "make" their own compilers. AMD is notorious for a "build it and they will come" mentality. Despite the fact that this hasn't worked. AMD needs to make it easy to adopt their hardware, and the way this is done is with software.

When they finally get to the point that their driver/libs are as easy to install as Nvidia's , it might be too late. I've argued this with AMD folks before.

The barriers to adoption need to be low. Friction needs to be low. They need to target ubiquity[3].

[1] https://developer.amd.com/amd-aocc/

[2] https://developer.nvidia.com/nvidia-hpc-sdk-downloads

[3] https://blog.scalability.org/2008/02/target-ubiquity-a-busin...

AMD is best on the linux right now. But thats mostly thanks to them opening up their hardware for driver developers.
I was getting better performance out of the NVidia HPC SDK compilers, but then again, the old PGI compilers it is based upon (with an LLVM backend now), have always been my go-to for higher performance code.

I've got some Epycs and Zen2s at home here, and I have both compilers. Haven't done testing in recent months, but they've been updating them, so maybe I should look into that again. Thanks for the nudge!

Ah, I should've clarified here I mainly meant the GPUs :) Though keep in mind if you do computing (through CUDA), nvidia is still necessary.
> NVidia[2] do "make" their own compilers.

Actually Nvidia bought the Portland Compilers And Intel's Fortran compiler is (has been, now its backend is LLVM) MS's compiler via DEC/Compaq/and HP - MS Visual Fortran 4 -> DEC Visual Fortran 5 -> Compaq Visual Fortran 6 -> Intel Visual Fortran ;).

I know about the PGI purchase ... was unaware of the Intel link to MSFT. Huh.
You call Nvidia driver installation easy? Every bit of "ease" about that is hardly Nvidia's doing.
I'm not sure of what issue you have with my statement. For me, it is a painless download + sh NVIDIA-....run. I have mostly newer GPUs, though the 3 systems (1 laptop and 2 desktops) with older GTX 750ti and GT 560m run the nouveau driver (as Nvidia dropped support for those).

Its a 13 year old laptop, and still running strong (linux though). Desktops are Sandy Bridge based. The RTX2060 and RTX3060 are doing fine with the current drivers. I usually only update when CUDA changes.

But yeah, its pretty simple. I can't speak to non-linux OSes generally, though my experiences with windows driver updates have always been fraught with danger.

My zen2 laptop has an inbuilt Renior iGPU, and I use it with the NVidia dGPU also built (GTX 1660ti). I leverage the Linux Mint OSes packaging system there for the GPU switcher. I run the AMD on the laptop panel and the NVidia on the external display. Outside of weirdness with kernel 5.13, I've not had any problems with this setup.

My point is, that "single download" or apt command to you is a royal pain in the ass to maintain, and makes things like kernel hacking a royal nightmare, all for Nvidia to play stupid out of tree games with the linux kernel maintainers. Easy "for a subset of users" does not excuse going out of the way to create more friction where none need exist.

But I'm glad your preferred workloads are unaffected. That counts for something I guess.

I think it's more nuanced than that:

In the past, AMD just straight up had horrible software.

More recently, AMD have been investing more in open software, probably with the goal that indeed, a community form and they get "leverage" / ROI for their investment.

On the flip side, Intel invest heavily in high-quality but jealously guarded and closed source software.

With this nuance, I'm not so sure it's clear cut which one is "acceptable," and it's an interesting ethical question about Open Source and open-ness in general.

AMD still has horrible software, compare cuda to whatever crap AMD thinks you should use. Truth is its even hard to say what their alternative is, not to mention how horribly poorly they support what is, or at least should be their second if not most important/lucrative target.
And Intel has sandbagged us with 4 cpu cores for ages, leading to software that isn't being optimized for more cores. Suddenly AMD starts pushing many cores with high single core performance and Intel magically turns hyperthreading on for lower tier cpus and starts putting out way more cores.
AMD pays substantial royalties to Intel for x86.

https://jolt.law.harvard.edu/digest/intel-and-the-x86-archit...

However, this will become moot as even Intel is shifting towards LLVM.

https://www.intel.com/content/www/us/en/developer/articles/t...

There's a variety of options that are available here, and I don't buy the argument that AMD's behavior is automatically unethical.

A. Company makes and sells hardware, and offers no software.

B. Company makes and sells uniquely featured hardware, and offers software that uses those unique features.

C. Company makes and sells hardware that adheres to an industry standard, and offers software that targets hardware adhering to that standard.

D. Company makes and sells hardware that adheres to an industry standard, then uses their position in related markets to give themselves an unfair advantage in the hardware market.

Of these, options A, B, and C are all acceptable options. AMD has traditionally chosen option A, which is a perfectly reasonable option. There's no reason that a company is obligated to participate in a complementary market. Option D is the only clearly unethical option.

Intel's legitimate course is to make their CPUs run actually faster than the competition, instead of tricking people into running slower code on the competition.
> AMD has traditionally chosen option A, which is a perfectly reasonable option

AMD has optimized libraries https://developer.amd.com/amd-aocl/ and their own compilers: https://developer.amd.com/amd-aocc/