Hacker News new | ask | show | jobs
by dizhn 165 days ago
Try some Made In PRC models. They do not give a shit.
1 comments

I have tried a few Qwen-2.5 and 3.0 models (<=30B), even abliterated ones, but it seems that some words have been completely wiped from their pretraining dataset. No amount of prompting can bring back what has never been there.

For comparison, I have also tried the smaller Mistral models, which have a much more complete vocabulary, but their writing sometimes lacks continuity.

I have not tried the larger models due to lack of VRAM.

You can give their hosted versions a go using one of the free clis. (qwen coder cli has qwen models, opencode has a different selection all the time. it was glm recently. there's also deepseek which is quite cheap)