| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adefa 149 days ago

I ran a similar experiment last month and ported Qwen 3 Omni to llama cpp. I was able to get GGUF conversion, quantization, and all input and output modalities working in less than a week. I submitted the work as a PR to the codebase and understandably, it was rejected.

https://github.com/ggml-org/llama.cpp/pull/18404

https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF

1 comments

antirez 149 days ago

The refusal because often AI writes suboptimal GGML kernels looks very odd, to me. It means that who usually writes manually GGML kernels, could very easily steer the model into writing excellent kernels, and even a document for the agents can be compiled with the instructions on how to do a great work. If they continue in this way, soon a llama.cpp fork will emerge that will be developed much faster and potentially even better: it is unavoidable.

link

rjh29 149 days ago

The refusal is probably because OP said "100% written by AI" and didn't indicate an interest in actually reviewing or maintaining the code. In fact, a later PR comment suggests that the AI's approach was needlessly complicated.

link

hirako2000 149 days ago

Also because it's a large PR. Also because the maintainer has better things to do than taking longer and more energy to review than the author spent to write it, just to find that multiple optimisations will be requested, which the author may not be able to take on.

the creator of llama.cc can hardly be suspected to be reluctant or biased towards GenAI.

link

adefa 149 days ago

Absolutely -- it's perfectly understandable. I wanted to be completely upfront about AI usage and while I was willing and did start to break the PR down into parts, it's totally OK for the maintainers to reject that too.

I wanted to see if Claude Code could port the HF / MLX implementation to llama.cpp and it was successful -- in my mind that's wild!

I also learned a ton about GPU programming, how omni models work, and refined my approach to planning large projects with automated end to end integration tests.

The PR was mostly to let people know about the code and weights, since there are quite a few comments requesting support:

https://github.com/ggml-org/llama.cpp/issues/16186

link

hirako2000 149 days ago

Consider a fork while optimizing. Of Claude can optimize then you could prove someone wrong and get it merged.

Nice work getting multimodal in there already.

link

nickandbro 149 days ago

I wonder if some of the docs from https://app.wafer.ai/docs could be used to make the model be better at writing GGML kernels. Interesting use case.

link