Hacker News new | ask | show | jobs
by andy99 189 days ago
What do you mean about not doing evals? Just literally that you don’t run any benchmarks or do you have something against them?
2 comments

He's just saying anecdotally these models are good. A reasonable response might be "have you systematically evaluated them?". He has pre-answered - no.
Not OP, but perhaps they mean not putting too much faith in common benchmarks (thanks to benchmaxxing).
Yes to both comments. I said that to:

1. disclose my method was not quantifiably measurable as the not model, because that is not important to me, speed of action/development outcomes is more important to me, and because

2. I’ve observed a large gap between benchmark toppers and my own results

But make no mistake, I like have the terminals scrolling live across multiple monitors so I can glance at them periodically and watch their response quality, so I care and notice which give better/worse results.

My biggest goal right now after accuracy is achieving more natural human-like English for technical writing.