Hacker News new | ask | show | jobs
by jdw64 4 hours ago
If you look at the history of software engineering, the ones that made the most money were usually not the companies that built the applications themselves, but the ones that built the tools to verify, deploy, and build them, such as CI/CD, static analysis tools, and testing frameworks.

Personally, I agree with the Goodhart problem, but isn't the reason Eval startups fail because they try to sell an 'evaluation service' rather than a 'verification toolchain'? The problem, it seems, is that AI verification toolchains require a model in the end, because they internalize AI and sell it under the name of a 'harness.'

So an AI verification(eval) toolchain would have to be structurally different. Verifying AI code isn't about whether it compiles. AI code can always be made to compile. The issue involves various semantic criticisms, such as overfitting to existing designs and tests. To catch those issues, you ultimately need to build an AI. But building that AI is expensive. So in the end, AI verification companies depend on external model providers for the core components of their verification engine. I think this is a bad business decision

2 comments

> made the most money

> built the tools to verify, deploy, and build them, such as CI/CD, static analysis tools, and testing frameworks.

Curious. Which company made money with testing frameworks?

I thought about mentioning Atlassian (Jira) and JetBrains, but come to think of it, they aren't really testing frameworks. They cover the entire development workflow overall. I guess I was thinking too short.
The "shovels for gold miners" analogy is generally a good one. It applies to Nvidia, for example. It doesn't generally apply to developers though. Developer tooling is notoriously difficult to monetize. Developers themselves are a shovel.
Devs are hard to market and sell too I've heard. It's likely because they can build a lot of the stuff out there themselves when pressed. They have the most app exposure so are opinionated. It's why most devs take the open source spoils while everyone else avoids GitHub in general. Although AI has made it easy to setup locally, many still don't see the value of controlling their software or ai agents fully like devs.