Hacker News new | ask | show | jobs
by chaosfox 2040 days ago
unfortunately having a competitive card is only half the battle, they also need all the deep learning libraries to support using this card otherwise nobody is going to bother. I hope AMD understands that I enlists their own engineers to help the community make this card support solid.
1 comments

ROCm (https://rocmdocs.amd.com/en/latest/) is their compute framework/stack. Not as good as CUDA but has support for Tensorflow etc
The problem is that ROCm is only for linux which is still a huge downside, and it doesn't have good support (or support at all) for the consumer grade GPUs, pretty much Polaris onwards is good luck, heck even Radeon VII isn't well supported.

CUDA works because any NVIDIA GPU will run CUDA this means it's easier to learn, easier to prototype and easier to ship and the code you ship isn't limited to the datacenter.

What AMD needs to do to "win" an HPC GPU launch is to have an event which is 95% "How we fixed ROCm, and here is our full software roadmap and support guarantee for the next 5 years" and the remaining 5% "oh btw here is our new silicon, it's really fast and shiny".

I would be interested in knowing how much compute happens away from Linux. My impression is that almost nobody uses windows for theses tasks, but anecdotes are not data. There are of course workstation type acceleration tasks like simulation that are very windows heavy(e.g. ANSYS), but I am not privy to the breakdown of compute demand per segment.

> What AMD needs to do to "win" an HPC GPU launch is to have an event which is 95% "How we fixed ROCm, and here is our full software roadmap and support guarantee for the next 5 years" and the remaining 5% "oh btw here is our new silicon, it's really fast and shiny".

This is a great point. NVDA has worked on CUDA for years and has a great ecosystem of material and questions on places like stackexchange. AMD will have to work very purposefully to close the gap, but it seems like they are aware and headed in the right direction.

The software side of AMD has been (in preceding years) a disaster. I say that as someone interested in their products. Who would love to see a realistic competitor to CUDA.

AMDs linux support has been somewhere between no-assed to half-assed for the preceding decade. I think they believed that the world will return to windows for everything. That ship has sailed, long ago. The point about CUDA being usable up/down the HW stack is quite salient. When I develop GPU things, I start on my laptop GTX1060. Test on my deskside RTX2060, and run them on V100s. Code is in Julia, C, Fortran, so it should work anywhere with good underlying library support. I've got a zen laptop with integrated Radeon. No dice, can't do computing on it (yet).

AMDs function/library support is nascent, and will take years to get to a viable point for many.

I am hoping ... hoping ... that AMD sees this as an opportunity long term, and not a short term expense that must provide immediate ROI. SW ecosystems drive the HW purchases, but there is usually a lag of years before this engine really gets started.

AMD needs to be in this for the long haul.

Your impression is wrong, there is a metric ton of enterprise and consumer software that uses CUDA and runs only on windows.

There are also whole "data sciences" divisions in bluechip companies that are running windows.

Case in point i work for a huge financial company we have CUDA powered excel add-ins/macros...

And no I'm not joking https://on-demand.gputechconf.com/gtc/2010/presentations/S12...

And engineering, sciences and medical consumer applications are also quite often than not Windows only or Windows first.

Then you have all the less-enterprisy stuff video and photo editing, filters, chess programs w/e...

And lastly the biggest point is that Windows and consumer grade hardware is where most developers and students live, good luck running ROCm on your laptop, and no I really mean it it's officially not supported and in reality even if you manage to get a moderately compatible chip you'll encounter more bugs than on Klendathu.

Don't underestimate the importance of software that runs everywhere and just works. Node.JS didn't became popular because JavaScript on the backend was something that was desperately needed, it became popular because you had a plethora of front-end developers that had little to no knowledge of server-side languages and frameworks.

Unlike what HN and recruiters would like you to believe most developers can't learn 10 languages and frameworks, and definitely not well sure some can but the vast majority of developers don't spend 9 hours working and 9 hours hacking, for every dev with a github account that needs it's own storage rack there are 10,000 that just do 9 to 5 and check out.

If on one hand you have a solution that forces you to pick from a narrow list of linux kernels and supported distros and an extremely narrow list of GPUs and still encounter bugs on every corner so you can maybe produce something that if it runs would only run on the same system as yours vs on the other hand a solution that would run on any OS that supports an NVIDIA GPU you'll pick the latter unless you are really really bored.

And that is before you entertain the marketability and job prospects of learning CUDA vs ROCm, one allows you to get a job at any place that ships something that runs on a GPU it doesn't matter if it's something that occupies 1000 racks and might become sentient or something that filters excel spreadsheets faster the other one doesn't.

> And no I'm not joking https://on-demand.gputechconf.com/gtc/2010/presentations/S12...

Thank you for sharing the link and correcting my information bias. It sounds like the "workstation," compute world is a forest of deep niches.

You make a lot of good points about the staying power of Windows. I am excited about all the moves towards a complete Linux desktop, but am not imagining that it will be mainstream.

I’m not sure if it’s a forest of deep niches, at this point I would say that the niche is the 7 figure server racks with A100’s outisde of the cloud providers...

There are still more use cases for GPU compute on the edge than in the datacenter and that likely won’t change.

And for Linux on the enterprise desktop well then ROCm can’t run in WSL2, CUDA can so yet another reason to bloody support Windows...

Because WSL2 is ironically probably the way forward for Linux on the desktop for the majority of the computerized workforce.

> There are of course workstation type acceleration tasks like simulation that are very windows heavy(e.g. ANSYS)

Funny you mention ANSYS specifically as they seem to have pretty decent Linux support:

https://www.ansys.com/solutions/solutions-by-role/it-profess...

Although only on nVidia hardware if I'm interpreting it right

> Funny you mention ANSYS specifically as they seem to have pretty decent Linux support:

With ANSYS in particular the question is who is doing the simulation. If the engineer is doing it, many engineering tools are windows only (although this has been improving) so it makes sense to run ANSYS under windows as well so you can be close to your modelling software. If the stress or EM guys are separate from the designers, then it shouldn't matter as much.

I would love to see all productivity tools move to Linux and things have been getting a lot better over the years. Personally I'm excited around the noise that Microsoft was exploring office for Linux, as office is the only reason I ever boot into windows. What a godsend it would be to be able to program and run all my productivity software at the same time.

Yes, AMD’s fuckup in compute isn’t just ROCm but also their OpenCL support on Linux.

Windows still gets semi-decent support especially for their WS cards but Linux oh-boy...

Do not believe it. They wrote it for Windows and ported it badly.

Source: Struggled for months to get ANSYS to work even crappily on three different distros of Linux, two of which were clean installs of "officially supported" distros.

Eventually gave up, bought Windows, and installed it on that. Worked immediately.

That’s kind of the make-or-break plan for Frontier and El Capitan [1]. They’re having all the science folks try using ROCm and the HIP recompiler thing. We’ll see how that shakes out in practice.

[1] https://www.anandtech.com/show/15581/el-capitan-supercompute...

LLNL writes their own stack usually, I don't see the main API for El Capitan being anything but OpenMP and LLNL can and has written their own compilers and libraries for other GPU powered supercomputers.
Sure, but other folks have to make use of it, too. Not everyone's code will be abstracted from CUDA. They're trying to get folks on Summit to test out HIP more strongly. Repeating my comment from last summer [1] that linked to the "try to use HIP" [2]:

> The OLCF plans to make HIP available on Summit so that users can begin using it prior to its availability on Frontier. HIP is a C++ runtime API that allows developers to write portable code to run on AMD and NVIDIA GPUs. It is essentially a wrapper that uses the underlying CUDA or ROCm platform that is installed on a system. The API is very similar to CUDA so transitioning existing codes from CUDA to HIP should be fairly straightforward in most cases. In addition, HIP provides porting tools which can be used to help port CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.

[1] https://news.ycombinator.com/item?id=20495637

[2] https://www.olcf.ornl.gov/wp-content/uploads/2019/05/frontie...