Hacker News new | ask | show | jobs
by SwellJoe 66 days ago
You can run pretty much every model on Vulkan, including the Qwen MoE models. You can also run pretty much every model on ROCm, Apple Silicon via MLX, and Intel hardware via OpenVINO. Nvidia got there first, but they're no longer clearly dominant in the self-hosting space, simply because of the high cost. I think Apple probably has the lead there, due to unified memory allowing big models to run without multiple big dedicated GPUs, but stuff like Strix Halo with 128GB of unified memory is also pretty much sold out everywhere. There's a lower bound on how small a model can be and still be useful.

Anyway, I don't have any Nvidia hardware, and I've got several local models running and/or training at all times.

1 comments

Yes, but they're claiming massive generation speed which you won't get on Vulkan. You won't get it on ROCm on that Strix Halo, either.

It's just funny they talk about vendor lock, and they only support nvidia.