| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 1487 days ago

> Surely the people at these labs will want to run ordinary DL frameworks at some point

I don't know about that. A lot of these labs are doing physics simulations and are probably happy to stick with their dense-matrix multiply / BLAS routines.

Deep learning is a newer thing. These national labs can run them of course, but these national labs have existed for many decades and have plenty of work to do without deep learning.

> or do they have the money and time to always build entirely custom stacks?

Given all the talk about OpenMP compatibility and Fortran... my guess is that they're largely running legacy code in Fortran.

Perhaps some new researchers will come in and try to get some deep-learning cycles in the lab and try something new.

3 comments

jcranmer 1487 days ago

From my limited exposure to the HPC groups at the labs, there's a mixture of languages in use. It seems that modern C++ is the dominant language for a lot of new projects--some of the people I talked to were working on libraries that aggressively used C++11/C++14 features.

The biggest challenge the national labs face is that there's not really any budget (or appetite) to rewrite software to take advantage of hardware features (particularly the GPU-based accelerator that's all the rage nowadays). You might be able to get a code rewritten once, but an era where every major HPC hardware vendor wants you to rewrite your code into their custom language for their custom hardware results in code that will not take advantage of the power of that custom hardware. OpenMP, being already fairly widespread, ends up becoming the easiest avenue to take advantage of that hardware with minimal rewriting of code (tuning a pragma doesn't really count as rewriting).

Symmetry 1487 days ago

Also, while NVidia has been adding extra AI acceleration to their chips AMD has been throwing in extra double precision resources that HPC generally requires. If you're training an AI rather than simulating the climate/a thermonuclear explosion/etc then you're probably better off using NVidia cards but AMD made the right technical investments to get these supercomputer contracts.

dekhn 1487 days ago

It's kind of surprising that nvidia hasn't purchased AMD. It really feels like there's a single company between the two that would be truly effective- AMD for the classic CPU oomph, nvidia for the GPU oomph, combining their strengths in interconnects. It would be a player from the high-end PC to the supercomputer market, without even pretending to go for the low-power market (ARM).

jcranmer 1487 days ago

> It's kind of surprising that nvidia hasn't purchased AMD.

One word: antitrust. The discrete GPU market these days consists of Nvidia and AMD, with Intel only just now dipping its toes into the market (I don't think there's anything saleable to retail customers yet). Nvidia buying AMD would make it a true monopoly in that market, and there's no way that would pass antitrust regulators. Nvidia recently tried to buy ARM, and even that transaction was enough for antitrust regulators to say no.

ridgered4 1487 days ago

AMD and Nvidia were in talks to merge at one point, apparently the talks fell apart because Nvidia's CEO insisted on being the new CEO of the combined company and AMD would have none of that. So they purchased ATI instead, probably overpaid for it and probably pushed the bulldozer concepept to hard in an effort to prove it was worth it after all.

Nvidia actually used to develop chipsets for AMD processors include onboard GPUs, they did for Intel as well but they had a much more serious relationship with AMD in my estimation. This stopped with the ATI purchase since ATI is nvidia's main competitor the two companies stopped working together. Intel later killed all 3rd party chipset altogether and AMD had to do a lot of chipset work they weren't doing before.

I sometimes wonder what would have happened if they had merged back then. I personally think a Jensen Huang run AMD would have done much better than AMD+ATI did in that era. I could easily see ATI having collapsed. What would the consoles use now? Would nvidia have been as aggressive as it has been without the strategic weakness of now controlling the platform it's products run on?

krylon 1487 days ago

Intel and AMD have a patent-licensing agreement where Intel licenses their x86 stuff to AMD, and AMD licenses their amd64 stuff to Intel. AFAIK, the moment AMD gets bought by another company, they can no longer use Intel's patents, and the moment that happens, Intel can no longer use AMD's patents. I'm not sure how much of x86/amd64 you can legally implement without infringing on any of these patents, but it might very well result in a really awkward situation.

Sure, the new owners could re-negotiate with Intel, and maybe nothing would change. But who knows? A combined AMD/nVidia might be a sufficient threat to Intel they might pull some desperate moves.

(In some timeline, this turns out to be the boost that makes RISC-V the new "standard" ISA, but I am not so optimistic it is the one we live in.)

paulmd 1487 days ago

I think based on recent history you can argue that NVIDIA is very aware of the potential anticompetitive actions that could result if they kill or even substantially pass AMD.

There really used to be a lot of intra-generational tweaking and refinement, like if you look back at Maxwell there were really at least 3 and I suspect 4 total steppings of the maxwell architecture (GM107, GM204/GM200, and GM206 - and I suspect GM200 was a separate "stepping" too due to how much higher it clocks than GM204 - which is the opposite of what you'd expect from a big chip). Kepler had at least 4 major versions (GK1xx, GK110B, GK2xx, GK210), Fermi had at least 2 (although that's where I'm no longer super familiar with the exact details).

Anyway point is there used to be a lot more intra-generational refinement, and I think that has largely stopped, it's just thrown over the wall and done. And I think the reason for that is that if NVIDIA really cranked full-steam ahead they'd be getting far enough ahead of AMD to potentially start raising antitrust concerns. We are now in the era of "metered performance release", just enough to stay ahead of AMD but not enough to actually raise problems and get attention from antitrust regulators.

Same thing for the choice of Samsung 8nm for Ampere and TSMC 12nm for Turing, while AMD was on TSMC 7nm for both of those. Sure, volume was a large part of that decision, but they're already matching AMD with a 1-node deficit (Samsung 8nm is a 10+, and the gap between 10 and TSMC 7 is huge to begin with) and they were matching with a 1.5 node deficit during the Turing generation (12FFN is a TSMC 16+ node - that is almost 2 full nodes to TSMC 7nm). They cannot just make arbitrarily fast processors that dump on AMD, or regulators will get mad, so in that case they might as well optimize for cost and volume instead. If they had done a TSMC 7nm against RDNA1 they probably would be starting to get in that danger zone - I'm sure they were watching it carefully during the Maxwell era too.

(the people who imagined some giant falling-out between TSMC are pretty funny in hindsight. (A) NVIDIA still had parts at TSMC anyway, and (B) TSMC obviously couldn't have provided the same volume as Samsung did, certainly not at the same price, and volume ended up being a godsend during the pandemic shortages and mining. Yeah, shortages sucked, but they could still have been worse if NVIDIA was on TSMC and shipping half or 2/3rds of their current volume.)

Of course now we may see that dynamic flip with AMD moving to MCM products earlier, or maybe that won't be for another year or so yet rumors are suggesting monolithic midrange chips will be AMD's first product. Or perhaps "monolithic", being technically MCM but with cache dies/IO dies rather than multiple compute dies. But with RDNA3 AMD is potentially poised to push NVIDIA a little bit, rather than just the controlled opposition we've seen for the past few generations, hence NVIDIA reportedly moving to TSMC N5P and going quite large with a monolithic chip to compete.

marcosdumay 1487 days ago

> Given all the talk about OpenMP compatibility and Fortran... my guess is that they're largely running legacy code in Fortran.

The must used linear algebra library is written in Fortran. There's nothing "legacy" about it, it's just that nobody was able to replicate its speed in C.

paulmd 1487 days ago

I don't remember the exact specifics, but Fortran disallows some of the constructs that C/C++ struggle with aliasing on, so Fortran can often be (safely) optimized to much higher-performance code because of this limitation/knowledge.

Like, it's always seemed like there's a certain amount of fatalism around Undefined Behavior in C/C++, like this is somehow how it has to be to write fast code but... it's not. You can just declare things as actually forbidden rather than just letting the compiler identify a boo-boo and silently do whatever the hell it wants.

Of course it's not the right tool for every task, I don't think you'd write bit-twiddling microcontroller stuff in fortran, or systems programming. But for the HPC space, and other "scientific" code? Fortran is a good match and very popular despite having an ancient legacy even by C/C++ standards (both have, of course, been updated through time). Little less flexible/general, but that allows less-skilled programmers (scientists are not good programmers) to write fast code without arcane knowledge of the gotchas of C/C++ compiler magic.

jabl 1487 days ago

> I don't remember the exact specifics, but Fortran disallows some of the constructs that C/C++ struggle with aliasing on, so Fortran can often be (safely) optimized to much higher-performance code because of this limitation/knowledge.

For a crude approximation, Fortran is somewhat equivalent to C code where all pointer function arguments are marked with the restrict keyword.

> Like, it's always seemed like there's a certain amount of fatalism around Undefined Behavior in C/C++, like this is somehow how it has to be to write fast code but... it's not. You can just declare things as actually forbidden rather than just letting the compiler identify a boo-boo and silently do whatever the hell it wants.

Well, it's kind more dangerous than C, in this aspect. The aliasing restriction is a restriction on the Fortran programmer; the compiler or runtime is not required to diagnose it, meaning that the Fortran compiler is allowed to optimize assuming that two pointers don't alias.

That being said, in general I'd say Fortran has less footguns than C or C++, and is thus often a better choice for a domain expert that just wants to crunch numbers.

jcranmer 1487 days ago

> The must used linear algebra library is written in Fortran.

My understanding is that most supercomputers have the vendor provide their implementation of BLAS (e.g., if it's Intel-based, you're getting MKL) that's specifically tuned for that hardware. And these implementations stand a decent chance of being written in assembly, not Fortran.

bee_rider 1487 days ago

Usually C or Fortran superstructure, and assembly kernels.

The clearest form of this is in BLIS, which is a C framework you can drop your assembly kernel into, and then it makes a BLAS (along with some other stuff) for you. But the idea is also present in OpenBlas.

Lots of this is due to the legacy of gotoBlas (which was forked into OpenBlas, and partially inspired BLIS), written by the somewhat famous (in HPC circles at least) Kazushige Goto. He works at Intel now, so probably they are doing something similar.

dragontamer 1487 days ago

BLAS itself has been rewritten in Nvidia CUDA and AMD HIP, and is likely the workhorse in this case. (Remember that Frontier is mostly GPUs and the bulk of code should be GPU compatible)

Presumably that old Fortran code has survived many generations of ports: Connection Machine, DEC Alpha, Intel Itanium, SPARC and finally today's GPU heavy systems. The BLAS layer keeps getting rewritten but otherwise the bulk of the simulators still works.

bee_rider 1487 days ago

I think you've made a slightly bigger claim than is necessary, which has lead to a focus on BLAS, which misses the point.

The best BLAS libraries use C and Assembly. This is because BLAS is the de-facto standard interface for Linear Algebra code, and so it is worthwhile to optimize it to an extreme degree (given infinite programmer-hours, C can beat any language, because you can embed assembly in C).

But for those numerical codes which aren't incredibly hand-optimized, Fortran makes nice assumptions, it should be able to optimize the output of a moderately skilled programmer pretty well (hey we aren't all experts, right?).

nspattak 1487 days ago

If you are talking about netlib blas/lapack I am very confused by what you are saying because the fastest blas/lapack implementations are in c/c++.