|
|
|
|
|
by philipp-gayret
37 days ago
|
|
Nice work! I've worked through all kinds of local models, very extensively for a week on an NVidia Spark. Gemma and Qwen, quantized, somewhat shine but the results overall compared to say a Claude Haiku were so disappointing (in context of tool calling) that I ended up returning the hardware. I'm curious how the same local models and benchmarks I have will hold up, will try this. |
|
The key I think is to look at what use cases you have that aren't big monsters. Auditing logs, home assistant, reading and summarizing news rss feeds, etc...stuff that's fairly bite-sized per task, but high volume. Then the local models make sense and they just need mechanical reliability to close the gap.