Hacker News new | ask | show | jobs
by akawry 311 days ago
Take a look at ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp

CPU performance is much better than mainline llama, as well as having more quantization types available