Hacker News new | ask | show | jobs
by memhole 462 days ago
Do you mostly stick with smaller models? I’m pretty surprised at how good the smaller models can be at times now. A year ago they were nearly useless. I kind of like too the hallucinations are more obvious sometimes. Or at least it seems like they are.
2 comments

I like the smaller models because they are faster. I even got a Llama 3 1B model running on TinkerBoard 2S and it was fun to play around with and not too slow. The smaller models are still good at summarizing and other basic tasks. For coding they start showing their limits but still work great for trying to figure out issues in small bits of code.

The real issue with local models is managing context. smaller models let you have a longer context without losing performance but bigger models are smarter but if you want to keep it fast I have to reduce the context length.

Also all of the models have their own "personalities" and they still manifest in the finetunes.

Yeah, that’s why I like the smaller models too. Big context windows and intelligent enough most of the time. They don’t follow instructions as well as the larger models ime. But then on the flip side the reasoning models struggle to deviate. I gave deepseek an existential crisis by accident the other day lol.

Agreed on personalities. Phi, I think because of the curated training data comes across as very dry.

I still find them useless. What do you use them for?