| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fhqghds 2186 days ago
	Get ready for a surprise then: all those FLOPS are coming from the ARM cores.... This beast has no GPUs: https://postk-web.r-ccs.riken.jp/spec.html

4 comments

Merrill 2186 days ago

It looks like this is not an ARM core, but a Fujitsu implementation of the Arm v8-A instruction set and Fujitsu-developed Scaleable Vector Extension. Most likely the latter is doing all the heavy lifting.

https://www.fujitsu.com/global/about/resources/news/press-re...

>A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance.

d_tr 2186 days ago

The text you linked to actually says that the SVE was developed cooperatively by Fujitsu and ARM, without, however, going into details about who did what.

numpad0 2186 days ago

There are words floating that A64fx is basically a SPARC with ARM ISA without much ARM IP in it, no idea how accurate but intriguing

leeter 2186 days ago

So looking at anandtech's breakdown the CPUs are closer to a knights landing 'CPU/GPU' than a traditional CPU (currently). They also have a ton of HBM2 right next to the dies so this should be insanely fast as they can feed those cores very very quickly regardless of how fast each core is by clock and pipeline. That should massively reduce stalls.

stephencanon 2186 days ago

The "traditional CPU" portion of the core is a bit more capable than KNL, but yeah, that's roughly accurate.

leeter 2186 days ago

Oh agreed, but honestly what makes this so interesting is how tuned it is. I'm honestly surprised we haven't seen Intel or AMD ship an HPC CPU with on package HBM2 yet.

m_mueller 2186 days ago

Besides FLOP/Watt what's also very interesting here is the FLOP/Byte ratio (memory bandwidth). It has kept the same balance as K computer, i.e. is geared at scientific workloads and not just benchmarks (duh, just worth pointing out here as it makes this machine quite special especially compared to Xeon based clusters - Intel IMO has dropped the ball on bandwidth since the last 5 years or so).

gnufx 2186 days ago

As an early user of KNL, I don't get the "GPU" bit. KNL runs normal x86_64 code and doesn't look that much different to the AMD Interlagos systems I once used apart from the memory architecture.

leeter 2186 days ago

It comes from the fact that KNL came from Larrabee which was actually developed as a GPU initially (and even ran games... sort of) but was never actually released. The next revision of that was the Xeon Phi chips you used. So the connection is "Lots of small cores with lots of high bandwidth ram" although these cores are definitely superscalar where Larrabee and derivatives were not really.

https://en.wikipedia.org/wiki/Xeon_Phi https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)

gnufx 2186 days ago

Sure, but people don't normally think of "GPU" in this context as just runs all your x86_64 code.

ViralBShah 2186 days ago

That's pretty cool! That probably means that applications will have an easier time. Looks like it has 512-bit SIMD.

I wonder what BLAS they are using, and if the contributions are open sourced.

gnufx 2186 days ago

(SVE isn't 512-bit SIMD like AVX512.) I don't know what BLAS they're using, though I know they've long worked on their own, but BLIS has gained SVE support recently, for what it's worth.

floatboth 2186 days ago

SVE is whatever width the chip designer wants, Fujitsu's implementation is 512-bit according to AnandTech

gnufx 2186 days ago

I know, but it's different apart from coming in different hardware widths, as ARM techies will gush.

jabl 2186 days ago

Yes, SVE, like the RISC-V vector extension, is a "real" vector ISA, with things like vector length register (no need for a scalar loop epilog), scatter/gather memory ops for sparse matrix work, mask registers for if-conversion, looser alignment requirements (no/less need for loop prologues).

That being said, apart from becoming wider, AVX-NNN has also gotten more "real" vector features with every generation. The difference might not be as huge anymore.

d_tr 2186 days ago

I am really happy to have come across this post, mainly due to this fact.