|
|
|
|
|
by rdos
80 days ago
|
|
14B even at Q4 isn't realistic for coding on a single 12GB RTX 3060. Token speed is too slow. After all they are dense models. You aren't getting a good MoE model under 30B. You can do OCR, STT, TTS really well and for LLMs, good use cases are classification, summarization and extraction with <10B models. |
|
Add a third one and you can run Qwen 3.5 27B Q6 with 128k ctx. For less than the price of a 3090.