Hacker News new | ask | show | jobs
by segmondy 2 days ago
The author is correct, the model architecture is now much more complicated. You can see this if you use llama.cpp and follow the project. The earlier models were always fully implemented. Yet with more contributors, as of today tons of latest models only have partial implementation. DeepSeekv3.2 isn't fully implemented, same with KimiK2.6, GLM5.2+, DeepSeekv4 has no implementation, MiniMaxM3 not supported yet, Hy3-preview no implementation. The latest models are just bare bones to run with lots of support missing for the advanced features.
2 comments

The architecture is not much more complicated. And llama.cpp not implementing something is more likely an issue of its business model and financial incentives than it being due to raw complexity.
indeed, there's even a (pretty solid) custom server just for DS4 https://github.com/antirez/ds4

-- works very well on high-RAM Macs

I don’t k ow what I’m doing wrong. Everyone says ds4 is faster than a lot of models around the same size, but I’m getting 2t/s with DSv4 vs 12t/s with Minimax 2.7 (16Gb 5080 + 16gb 5060ti + 128gb ram).
ds4 is optimized for systems with unified memory. it works best on apple silicon with 96GB+ of RAM
ah! ok, now that makes sense. Thanks for that.