|
Hope you don't mind, but I have a rant I need to get out. I decided to give this another try now that you've mentioned it. Let's get things started the way the arch wiki suggests: $ sudo pacman -S rocm-hip-sdk
$ /opt/rocm/bin/clinfo
ERROR: clGetPlatformIDs(-1001)
$ sudo /opt/rocm/bin/clinfo
...
Board name: AMD Radeon RX 6600 XT
...
Ok, I wonder what's wrong. maybe it's this? https://stackoverflow.com/questions/4959621/error-1001-in-cl...Nope. Anything about this on the arch wiki? Nope This bug report[2] from 2021? Maybe I need to update my groups. [2]: https://github.com/RadeonOpenCompute/ROCm/issues/1411 $ ls -la /dev/kfd
crw-rw-rw- 1 root render 237, 0 Sep 26 20:33 /dev/kfd
$ sudo usermod -aG render $(whoami)
$ # relogin
$ /opt/rocm/bin/clinfo
ERROR: clGetPlatformIDs(-1001)
Ok, I'm a pretty advanced linux user, I'll just jump right in: $ strace /opt/rocm/bin/clinfo
...
openat(AT_FDCWD, "rusticl.icd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
Apparently I have some leftover environment variables (OCL_ICD_VENDORS) from last time I spent half a day trying to get this to work. I can fix that. After all, it'd be entirely unreasonable to expect rocm to give me a better error, like "Could not open opencl icd `rusticl.icd`".Success: $ /opt/rocm/bin/clinfo
Number of platforms: 1
...
Board name: AMD Radeon RX 6600 XT
Well, let's run some apps! $ darktable -d opencl
...
[dt_opencl_device_init]
DEVICE: 0: 'gfx1032'
PLATFORM NAME & VENDOR: AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc.
...
PHI node has multiple entries for the same basic block with different incoming values!
%967 = phi float [ %largephi.extractslice0, %sw.default ], [ %largephi.extractslice055, %sw.bb667 ], [ %largephi.extractslice059, %sw.bb663 ], [ %largephi.extractslice063, %sw.bb659 ], [ %largephi.extractslice067, %sw.bb655 ], [ %largephi.extractslice071, %sw.bb646 ], [ %largephi.extractslice075, %_Z4fmodff.exit16 ], [ %largephi.extractslice079, %_Z4fmodff.exit13 ], [ %largephi.extractslice083, %_Z4fmodff.exit ], [ %largephi.extractslice087, %sw.bb562 ], [ %largephi.extractslice091, %sw.bb555 ], [ %largephi.extractslice095, %sw.bb533 ], [ %largephi.extractslice099, %if.then502 ], [ %largephi.extractslice0103, %if.else517 ], [ %largephi.extractslice0107, %if.then456 ], [ %largephi.extractslice0111, %if.else471 ], [ %largephi.extractslice0115, %if.then393 ], [ %largephi.extractslice0119, %if.else408 ], [ %largephi.extractslice0123, %if.then338 ], [ %largephi.extractslice0127, %if.else353 ], [ %largephi.extractslice0131, %if.then283 ], [ %largephi.extractslice0135, %if.else298 ], [ %largephi.extractslice0139, %if.then224 ], [ %largephi.extractslice0143, %if.else241 ], [ %largephi.extractslice0147, %sw.bb193 ], [ %largephi.extractslice0151, %sw.bb180 ], [ %largephi.extractslice0155, %sw.bb168 ], [ %largephi.extractslice0159, %sw.bb158 ], [ %largephi.extractslice0163, %sw.bb147 ], [ %largephi.extractslice0167, %if.then116 ], [ %largephi.extractslice0171, %if.else131 ], [ %largephi.extractslice0175, %sw.bb71 ], [ %largephi.extractslice0179, %sw.bb ], [ %largephi.extractslice0183, %if.end ], [ %largephi.extractslice0187, %if.end ], [ %largephi.extractslice0191, %if.end ], [ %largephi.extractslice0195, %if.end ], [ %largephi.extractslice0199, %if.end ]
label %if.end
%largephi.extractslice0183 = extractelement <4 x float> %div, i64 0
%largephi.extractslice0191 = extractelement <4 x float> %div, i64 0
in function blendop_Lab
LLVM ERROR: Broken function found, compilation aborted!
[1] 27586 IOT instruction (core dumped) darktable -d opencl
uh that's great. Maybe blender?It worked! Not too bad for 2 minutes render: https://i.imgur.com/FD1SsQG.png What about pytorch? It prompted this whole thing anyway: $ sudo pacman -S python-pytorch-rocm python-torchvision
$ python neural_style/neural_style.py eval --content-image ../../2min.png --model ./saved_models/mosaic.pth --output-image out.png --cuda 1
[1] 32471 segmentation fault (core dumped) python neural_style/neural_style.py eval --content-image ../../2min.png
$ sudo dmesg --follow
[ 2467.536713] python[33309]: segfault at 68 ip 00007f12c5504d5d sp 00007ffc8f539c20 error 4 in libamdhip64.so.5.6.31062[7f12c541e000+357000] likely on CPU 14 (core 7, socket 0)
[ 2467.536727] Code: ec 78 48 89 bd 78 ff ff ff 64 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 85 f6 0f 88 09 03 00 00 48 8b 85 78 ff ff ff 48 63 de <48> 8b 50 68 48 8b 40 70 48 89 85 70 ff ff ff 48 29 d0 48 c1 f8 03
uh oh. Maybe I can crack some passwords? $ hashcat -m 0 -a 0 -o cracked.txt target_hashes.txt /usr/share/dict/american-english
...
hiprtcCompileProgram(): HIPRTC_ERROR_COMPILATION
error: unknown argument: '-flegacy-pass-manager'
1 error generated when compiling for gfx1032.
* Device #1: Kernel /usr/share/hashcat/OpenCL/shared.cl build failed.
Well, so much for that.Best I can get to work with rocm is 1/4 apps. |