| I'm still a bit confused because it says "All uploads use Unsloth Dynamic 2.0" but then when looking at the available options like for 4 bits there is: IQ4_XS 5.17 GB, Q4_K_S 5.39 GB, IQ4_NL 5.37 GB, Q4_0 5.38 GB, Q4_1 5.84 GB, Q4_K_M 5.68 GB, UD-Q4_K_XL 5.97 GB And no explanation for what they are and what tradeoffs they have, but in the turorial it explicitly used Q4_K_XL with llama.cpp . I'm using a macmini m4 16GB and so far my prefered model is Qwen3-4B-Instruct-2507-Q4_K_M although a bit chat but my test with Qwen3.5-4B-UD-Q4_K_XL shows it's a lot more chat, I'm basically using it in chat mode for basic man style questions. I understand that each user has it's own specific needs but would be nice to have a place that have a list of typical models/hardware listed with it's common config parameters and memory usage. Even on redit specific channels it's a bit of nightmare of loot of talk but no concrete config/usage clear examples. I'm floowing this topic heavilly for the last 3 months and I see more confusion than clarification. Right now I'm getting good cost/benefit results with the qwen cli with coder-model in the cloud and watching constantly to see when a local model on affordable hardware with enviroment firendly energy comsumption arrives. |
Q4_0 and Q4_1 were supposed to provide faster inference, but tests showed it reduced accuracy by quite a bit, so they are deprecated now.
Q4_K_M and UD-Q4_K_XL are the same, just _XL is slightly bigger than _M
The naming convention is _XL > _L > _M > _S > _XS