Hacker News new | ask | show | jobs
by tmzt 45 days ago
Doing the same for Apple M-series with fused wgsl shaders specifically targeting Qwen3/3.5.

My effort is called shady-thinker and is on github at github.com/tmzt/shady-thinker.

This was inspired in part by Antirez's earlier work with C kernels as well as other efforts to support in-browser LLMs. I've adapted them to Rust and the wgpu library.

Gemma 4 is also the next likely target (with the MTP work) as I'm experimenting with local AI agents.

I'd love to see what you've done to improve prefill and decode even if its not directly applicable.

One difference, I'm using MLX and GPTQ 4bit quants including AutoRound with safetensors as my shader pipeline is pretty much fixed for each model, ggml just adds unnecessary complexity.