| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by otterk10 489 days ago

I've spent the past 2 years as a fractional AI engineer, and this totally resonates with the advice I've given clients.

They usually hire me after they've gained some initial traction and are stuck on how to improve their LLM further.

Most of the time, they don't have any evals in place. But when they do, it's usually an LLMJudge that shows "everything working well" against a tiny curated dataset. I then spend a couple of weeks painstakingly going through their production data and finding a ton of issues they didn't know about. Their response is usually "holy crap there's so many issues we didn't know about! We had deluded ourselves into thinking our LLM is close to perfect, when it really has a ton of issues".