|
|
|
Ask HN: How do you personally evaluate new LLM models?
|
|
2 points
by _samjarman
322 days ago
|
|
Hey folks, how do you personally evaluate new HN models? Vibes? Or do you have some tests you like to run? Or do you just use them in your IDE/text iterface for a bit and see how it feels? I know we could probably trust some more public benchmarks but I'm curious on personal evaluation techniques. Thanks! |
|
I also have 1 seat of my pants tests of 'give me a story' and its themed what my kid likes lately.
Overall from my testing, the good players like claude get it correct in the first go. Amazing. But i dont mind giving it feedback, what matters is how many times i need to recorrect it. qwen-coder was extremely excessive.