Hacker News new | ask | show | jobs
by vlakreeh 907 days ago
What card could you have bought that lost support months later? I can't think of any new card you could have bought that'd lose support that quickly as far back as 2016
2 comments

https://github.com/ROCm/ROCm/issues/1353

Bought in 2020. Stopped working in 2020. Not the latest, but in-production, advertised ROCm-capable, and what I could find during the Great GPU Shortage of 2020.

Wow, not even an industry-standard-for-many-decades deprecation notice up front, to give people a heads up before hand.

That's pretty fucked. :(

It's even worse.

Order of operations:

1. It just silently stopped working, started crashing, and I had no idea what was going on or why. Just sort of intermittent f-age where some older versions would work better, and newer ones would work worse. I never got it reliably working. In most cases, at some point, the system would become unresponsive, and then crash hard.

2. AMD removed it from the supported list (with no notice).

3. The github ticket above was filed, which explained what was going on.

Lots of time wasted debugging. This interacted with a half-dozen other AMD bugs and issues (such as learning the GPU only worked for compute headless; I needed to drive my monitor with a different card).

Human time is more expensive than equipment, so the total cost of this stuff was astronomical.

Yes, notice is industry-standard, but at the very least, when support broke / was later removed, the driver could try warning me "We no longer support this card, do you REALLY want to proceed?" rather than letting me know by hard-crashing my system.

People don't seem to understand how ROCm fails. Some inaccessible list somewhere buried deep down or a GitHub issues says your card is dropped. Apparently the average user is supposed to spend ages researching this. When you try it out, as any sane person does, you don't get a nice "unsupported GPU" message from ROCm. The failure when you use ROCm is instability of your entire OS, not some clean crashing of the program you ran. This invites lots of messing around and desperately trying to get it to work and is a very frustrating experience and then all people do is say "look in this obscure GitHub issue they dropped support for you GPU, you're the one in the wrong".
Not GP, but they dropped support for SKUs which were still sold at that point. The MI50 would be a fun accelerator to play with and easily worth 2000$ if it were made by Nvidia. Since they are made by AMD and not supported, you can grab them for 200$.

https://github.com/ROCm/ROCm/issues/2308

Support still exists, but the card is in maintenance mode. This doesn't mean that your models that work on an MI50 today are suddenly going to stop working, or the models ran on my MI25 will cease suddenly.
You are right, but I'm not willing to bother with that. From what I've read rocm is a mess to work with even on supported hardware, so yeah..
Install Ubuntu, install ROCm, run your model. It's really not that complicated, even with ancient MI25 cards.
That's only true if you don't mind sharing your systems with strangers in Russia, Asia, and Africa.

"Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024"

Worse, a lot of tooling will break before then. If you need a security patch for anything built on top of ROCm or CUDA, you often can't have an ancient version.

That's not to mention that I'd like to do new development and prototyping.

I don't do business with AMD for the same reason I don't do business with Google. They're not a reliable partner.

That the only bad actors are in Russia, Asia and Africa is becoming quite the trope.

Some very spectacular compromises by Lapsus$ aren’t originating from any of the above, but instead from the West, specifically the UK:

https://www.bbc.com/news/technology-67663128

I think you're missing the point. Bad actors are equally distributed around the world.

The difference, however, is that, in this case, as the article points out, the bad actor is in custody.

My local law enforcement, lawyer, and judicial system are able to help out with the bad actors here and keep the problem somewhat contained.

For a bad actor in Russia or India, you quite literally have no recourse. There are literally Youtube videos (Mark Rober did a special recently) showing out-of-jurisdiction scammer and hacker shops occupying entire office buildings, and there is almost nothing we can do about it.

I'm not even going to go on a diatribe about corruption and the quality of law enforcement, since one can have different opinions. What I will point out is that US+EU have extensive treaties which allow my local law enforcement agency to cooperate closely with law enforcement agencies between US, UK, and EU.

For a bad actor in Iran or North Korea? You're not even getting diplomatic contact.

Footnote: I mention India since they're a democracy and by all standards, try to act responsibly on the global stage. They just don't happen to be in-network for various US/EU-centric international conventions.

The ROCm tooling only supports Ubuntu, Fedora and a few other enterprise distro, but it is on track to make it into the main Debian archive at which point the Debian maintainers will backport fixes as needed.

The tooling for older MI25 cards has yet to break, please avoid creating FUD on hardware you don't own.

That is the thing about AMD GPUs, nobody sane wants to own the GPUs that work with ROCm.