Hacker News new | ask | show | jobs
by avcxz 1004 days ago
I'd also like to point out that ROCm has been packaged for Arch Linux since the beginning of 2023, with efforts starting since March 2020 [1].

Currently on Arch Linux you can run the following successfully:

  $ sudo pacman -S python-pytorch-rocm

Arch Linux even has ROCm support with blender.

[1] https://github.com/rocm-arch

1 comments

Hope you don't mind, but I have a rant I need to get out. I decided to give this another try now that you've mentioned it.

Let's get things started the way the arch wiki suggests:

    $ sudo pacman -S rocm-hip-sdk
    $ /opt/rocm/bin/clinfo
    ERROR: clGetPlatformIDs(-1001)
    $ sudo /opt/rocm/bin/clinfo
    ...
      Board name:     AMD Radeon RX 6600 XT
    ...
Ok, I wonder what's wrong. maybe it's this? https://stackoverflow.com/questions/4959621/error-1001-in-cl...

Nope. Anything about this on the arch wiki? Nope

This bug report[2] from 2021? Maybe I need to update my groups.

[2]: https://github.com/RadeonOpenCompute/ROCm/issues/1411

    $ ls -la /dev/kfd
    crw-rw-rw- 1 root render 237, 0 Sep 26 20:33 /dev/kfd
    $ sudo usermod -aG render $(whoami)
    $ # relogin
    $ /opt/rocm/bin/clinfo
    ERROR: clGetPlatformIDs(-1001)
Ok, I'm a pretty advanced linux user, I'll just jump right in:

    $ strace /opt/rocm/bin/clinfo
    ...
    openat(AT_FDCWD, "rusticl.icd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
Apparently I have some leftover environment variables (OCL_ICD_VENDORS) from last time I spent half a day trying to get this to work. I can fix that. After all, it'd be entirely unreasonable to expect rocm to give me a better error, like "Could not open opencl icd `rusticl.icd`".

Success:

    $ /opt/rocm/bin/clinfo
    Number of platforms:    1
    ...
      Board name:     AMD Radeon RX 6600 XT
Well, let's run some apps!

    $ darktable -d opencl
    ...
    [dt_opencl_device_init]
       DEVICE:                   0: 'gfx1032'
       PLATFORM NAME & VENDOR:   AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc.
    ...
    PHI node has multiple entries for the same basic block with different incoming values!
      %967 = phi float [ %largephi.extractslice0, %sw.default ], [ %largephi.extractslice055, %sw.bb667 ], [ %largephi.extractslice059, %sw.bb663 ], [ %largephi.extractslice063, %sw.bb659 ], [ %largephi.extractslice067, %sw.bb655 ], [ %largephi.extractslice071, %sw.bb646 ], [ %largephi.extractslice075, %_Z4fmodff.exit16 ], [ %largephi.extractslice079, %_Z4fmodff.exit13 ], [ %largephi.extractslice083, %_Z4fmodff.exit ], [ %largephi.extractslice087, %sw.bb562 ], [ %largephi.extractslice091, %sw.bb555 ], [ %largephi.extractslice095, %sw.bb533 ], [ %largephi.extractslice099, %if.then502 ], [ %largephi.extractslice0103, %if.else517 ], [ %largephi.extractslice0107, %if.then456 ], [ %largephi.extractslice0111, %if.else471 ], [ %largephi.extractslice0115, %if.then393 ], [ %largephi.extractslice0119, %if.else408 ], [ %largephi.extractslice0123, %if.then338 ], [ %largephi.extractslice0127, %if.else353 ], [ %largephi.extractslice0131, %if.then283 ], [ %largephi.extractslice0135, %if.else298 ], [ %largephi.extractslice0139, %if.then224 ], [ %largephi.extractslice0143, %if.else241 ], [ %largephi.extractslice0147, %sw.bb193 ], [ %largephi.extractslice0151, %sw.bb180 ], [ %largephi.extractslice0155, %sw.bb168 ], [ %largephi.extractslice0159, %sw.bb158 ], [ %largephi.extractslice0163, %sw.bb147 ], [ %largephi.extractslice0167, %if.then116 ], [ %largephi.extractslice0171, %if.else131 ], [ %largephi.extractslice0175, %sw.bb71 ], [ %largephi.extractslice0179, %sw.bb ], [ %largephi.extractslice0183, %if.end ], [ %largephi.extractslice0187, %if.end ], [ %largephi.extractslice0191, %if.end ], [ %largephi.extractslice0195, %if.end ], [ %largephi.extractslice0199, %if.end ]
    label %if.end
      %largephi.extractslice0183 = extractelement <4 x float> %div, i64 0
      %largephi.extractslice0191 = extractelement <4 x float> %div, i64 0
    in function blendop_Lab
    LLVM ERROR: Broken function found, compilation aborted!
    [1]    27586 IOT instruction (core dumped)  darktable -d opencl
uh that's great. Maybe blender?

It worked! Not too bad for 2 minutes render: https://i.imgur.com/FD1SsQG.png

What about pytorch? It prompted this whole thing anyway:

    $ sudo pacman -S python-pytorch-rocm python-torchvision
    $ python neural_style/neural_style.py eval --content-image ../../2min.png --model ./saved_models/mosaic.pth --output-image out.png --cuda 1
    [1]    32471 segmentation fault (core dumped)  python neural_style/neural_style.py eval --content-image ../../2min.png
    $ sudo dmesg --follow
    [ 2467.536713] python[33309]: segfault at 68 ip 00007f12c5504d5d sp 00007ffc8f539c20 error 4 in libamdhip64.so.5.6.31062[7f12c541e000+357000] likely on CPU 14 (core 7, socket 0)
    [ 2467.536727] Code: ec 78 48 89 bd 78 ff ff ff 64 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 85 f6 0f 88 09 03 00 00 48 8b 85 78 ff ff ff 48 63 de <48> 8b 50 68 48 8b 40 70 48 89 85 70 ff ff ff 48 29 d0 48 c1 f8 03
uh oh. Maybe I can crack some passwords?

    $ hashcat -m 0 -a 0 -o cracked.txt target_hashes.txt /usr/share/dict/american-english
    ...
    hiprtcCompileProgram(): HIPRTC_ERROR_COMPILATION

    error: unknown argument: '-flegacy-pass-manager'
    1 error generated when compiling for gfx1032.

    * Device #1: Kernel /usr/share/hashcat/OpenCL/shared.cl build failed.
Well, so much for that.

Best I can get to work with rocm is 1/4 apps.

This is why Christian and I have invested so much effort into the CI system for Debian. There needs to be a clear accounting of what works and what doesn't for every library on every architecture.
It's too late to edit, but I should add that the RX 6600 XT is not officially supported by the upstream ROCm project. It's not clear to me that the experience would be better on any other distro. That's where having public test logs would be valuable.
IIRC, nothing below the 6800 is supported by ROCm... so the lion's share of their installed base of the 6000 series is excluded from 'official' support. nVidia's compute drivers support all of their devices and have across multiple generations, AMD's support only the low volume devices and drop support for older generations seemingly almost as fast as they are released.
One of your problems might be that gfx1032 is not supported by AMD's ROCm packages, which has a laughably short list of supported hardware: https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h...

The normal workaround is to assign the closest architecture, eg gfx1030, so `HSA_OVERRIDE_GFX_VERSION=10.3.0` might help

Also, it looks like some of your tested projects are OpenCL? For me, I do something like: `yay -S rocm-hip-sdk rocm-ml-sdk rocm-opencl-sdk` to cover all the bases.

My recent interest has been LLMs and this is my general step by step for those (llama.cpp, exllama) for those interested: https://llm-tracker.info/books/howto-guides/page/amd-gpus

I didn't port the docs back in, but also here's a step-by-step w/ my adventures getting TVM/MLC working w/ an APU: https://github.com/mlc-ai/mlc-llm/issues/787

From my experience, ROCm is improving, but there's a good reason that Nvidia has 90% market share even at big price premiums.

EDIT: apparently Darktable and Blender have OpenCL issues that are fixed in the just released 5.7: https://github.com/ROCm-Developer-Tools/clr/issues/3

I can totally understand your frustrations, considering the rocm-arch team/community has been seeing these (and trying to fix them) for years now.

I urge you to post any problems you face on the discussions page [1] for the rocm-arch community. Just to get more visibility and to add to the corpus for others to see (or even just to complain and have a voice heard, lol).

[1] https://github.com/orgs/rocm-arch/discussions

So for integrating rocm support into packages, typically this is done by specifying rocm as a build flag. Thus, even if the project supports rocm if it hasn't been built for rocm targets, it won't work on rocm platforms.

For blender and python-pytorch, contributions were made to the Arch Linux build recipes so that they have rocm support, I'm not sure about darktable. For python-torchvision, see [2] to use a rocm build of it. Maybe that helps?

[2] https://aur.archlinux.org/packages/python-torchvision-rocm

Edit: this doesn't seem to be the case for darktable. Maybe wait for rocm 5.7? idk [3].

[3] https://github.com/ROCm-Developer-Tools/clr/issues/3#issueco...

Feel free to request rocm builds of packages on https://github.com/orgs/rocm-arch/discussions.

Others have discussed other issues such as gfx1032 not being officially supported and the fact we are packaging the source from amd repos so the experience may not be different than on other platforms. I will say though that just having an independent team aside from AMD to build and ship rocm is definitely great for the rocm community. Get the product out in the audience for more real world feedback to provide back to the rocm project and make it better. The rocm-arch folks have made several upstream contributions to rocm.

Definitely, excited on the progress of the Debian team and we've been keeping an eye on each other's progress. https://github.com/orgs/rocm-arch/discussions/674

I could get hashcat to work with poor performance but then the computer was unusable.