| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by segmondy 2 days ago
	The author is correct, the model architecture is now much more complicated. You can see this if you use llama.cpp and follow the project. The earlier models were always fully implemented. Yet with more contributors, as of today tons of latest models only have partial implementation. DeepSeekv3.2 isn't fully implemented, same with KimiK2.6, GLM5.2+, DeepSeekv4 has no implementation, MiniMaxM3 not supported yet, Hy3-preview no implementation. The latest models are just bare bones to run with lots of support missing for the advanced features.

2 comments

charcircuit 1 day ago

The architecture is not much more complicated. And llama.cpp not implementing something is more likely an issue of its business model and financial incentives than it being due to raw complexity.

link

KerrAvon 2 days ago

indeed, there's even a (pretty solid) custom server just for DS4 https://github.com/antirez/ds4

-- works very well on high-RAM Macs

link

alfiedotwtf 1 day ago

I don’t k ow what I’m doing wrong. Everyone says ds4 is faster than a lot of models around the same size, but I’m getting 2t/s with DSv4 vs 12t/s with Minimax 2.7 (16Gb 5080 + 16gb 5060ti + 128gb ram).

link

enduser 21 hours ago

ds4 is optimized for systems with unified memory. it works best on apple silicon with 96GB+ of RAM

link

alfiedotwtf 19 hours ago

ah! ok, now that makes sense. Thanks for that.

link