Running dual Pro B60 on Debian stable mostly for AI coding.
I was initially confused what packages were needed (backports kernel + ubuntu kobuk team ppa worksforme). After getting that right I'm now running vllm mostly without issues (though I don't run it 24/7).
At first had major issues with model quality but the vllm xpu guys fixed it fast.
Software capability not as good as nvidia yet (i.e. no fp8 kv cache support last I checked) but with this price difference I don't care. I can basically run a small fp8 local model with almost 100k token context and that's what I wanted.
This is a fp16 model. That's 54G in weights. I can load it only with fp8 quantization enabled (>= 128k context). I run into this error during generation though: https://github.com/vllm-project/vllm/issues/36350. Looks like an issue with the flash attention backend. But yeah, if you are OK with fp8 quantization on this model, it fits. I expect with 64G VRAM it will fit without quantization
There was the video a little while back where LTT built a computer for Linus Torvalds and they put an Intel Arc card inside, so I'd imagine Linux support is at the very least, acceptable.
Ive ran arc on fedora for years and for general desktop use it’s been perfect. For llm’s/coding it’s getting better but it’s rough around the edges. Had a bug where trying to get vram usage through pytorch would crash the system, ect.
Quicksync doesn't do its work on the CPU, it does the work on the integrated GPU. Their processors that did not have on-board graphics did not have Quicksync support. See their P series and many of their Xeon parts which do not carry Quicksync support, while the versions with integrated graphics do have it.
AMD chips that have integrated GPUs (their APU series of chips) often do have support for hardware video encoders. Because, once again, its a function of the GPU and not the CPU.
I'm using a B580 for a windows 10 media pc and it's fine even for moderate gaming when I drop down to 1080p on my 4k tv, although I did notice a little stuttering from time to time.
To be fair, that might be due to still running Windows 10 or due to not having reset the PC in 4 years. It's going to be moved over to Linux soon, I'm just being lazy.
I was initially confused what packages were needed (backports kernel + ubuntu kobuk team ppa worksforme). After getting that right I'm now running vllm mostly without issues (though I don't run it 24/7).
At first had major issues with model quality but the vllm xpu guys fixed it fast.
Software capability not as good as nvidia yet (i.e. no fp8 kv cache support last I checked) but with this price difference I don't care. I can basically run a small fp8 local model with almost 100k token context and that's what I wanted.