Hacker News new | ask | show | jobs
by albertzeyer 27 days ago
More information about DwarfStar 4 (DS4) in the readme: https://github.com/antirez/ds4

The code seems based on llama.cpp and GGML.

I don't fully understand why it is a standalone project. The readme discusses this: DwarfStar 4 is a small native inference engine specific for DeepSeek V4 Flash. It is intentionally narrow: ...

I think the only bigger difference in DeepSeek V4 vs other models is maybe the type of self-attention. And that leads to: KV cache is actually a first-class disk citizen.

But I still feel like those changes could have been implemented as part of some of the other local engines.

I also assume more models will come out, not just from DeepSeek but also from others, and they might share similar self-attention approaches, that would benefit from a similar KV cache implementation.

2 comments

Check the readme better. The code overlap with ggml is very small, but a few kernel and ideas and the quants code were taken. Still the project connection with llama.cpp and ggml is huge and also present in the license because it's not a matter of code but of a whole ecosystem built, engineering lessons on how to do things and many other stuff. Also the readme explains exactly why a vertical inference system for a single model is the goal of the project.
because llama.cpp doesn't accept fully pr made by ai agents even if they are guided by the author

https://github.com/ggml-org/llama.cpp/blob/master/AGENTS.md

Which makes sense, the amount of PRs llama.cpp receives from authors who have no clue what they're doing and can't even answer simple questions about "what they did" is staggering, must be very exhausting to have to figure out "is it worth replying to this author?" for every single PR.