|
|
|
Show HN: Best setup local LLM found for a 5090 (llama.cpp fork + turboquant)
(local-llm.utop.workers.dev)
|
|
2 points
by utopman
6 days ago
|
|
Hi folks,
I found this setup on consummer hardware that seems to have great results on local hardware.
- qwen 3.6 q6
- 450 K context using turboquant turbo3 mode llama.cpp fork
- multimodal support This AI generated blog article is a kind of "report" of what and how I did and result exemples. I hope this can be usefull to some peopole. Note : I am not much intersted in having success with this article, I mainly want to share what I think is an interesting use of a 5090. I generated the blog page telling AI to be compliant with hn "rules" and remain factual. It's definitely not perfect, done rather quickly, not properly tested over 265K context. please forgive my lazyness :) . I am just enthousiast right now about what can be done on a 5090. |
|
The Qwen3.6-35B model has, in my testing, been decent but not nearly as good as the Qwen3.6-27B. Running that with a less quantized cache is going to be "better" for anyone using it for software dev in my limited testing.