Hacker News new | ask | show | jobs
by Octoth0rpe 47 days ago
> An over-engineered solution (complete with CLI, storage backend, documentation, unit tests) for a trivial problem which that person would've solved by an elegant bash one-liner only 3 years ago.

Importantly, I think AI companies are motivated towards the overengineered solutions as they increase the buyer's token spend. I'm not sure how we can create incentives that optimize for finding the 'right' solution, which may be the cheapest (the bash one-liner). Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?

4 comments

> Importantly, I think AI companies are motivated towards the overengineered solutions as they increase the buyer's token spend.

Yes that, and also, the more complicated the solution, the more likely no one reads or reviews it too carefully, and will instead depend on an LLM to ‘read’ and ‘review it’

Even ignoring token costs, there’s a high incentive for LLMs to generate complex solutions, because those solutions generate demand for further LLM use. (You don’t really want to review that 30,000 line pull request by hand, do you?)

This reminds me off this famous quote by Tony Hoare:

    "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies."
> Yes that, and also, the more complicated the solution, the more likely no one reads or reviews it too carefully, and will instead depend on an LLM to ‘read’ and ‘review it’

Exactly right. It's the other end of the bikeshed continuum[1]. If you send out a two-page design doc or a hundred like pull request, the recipient will actually review it. Let AI inflate that to ten pages or a thousand lines of code and they feel like they don't have enough mental capacity to tackle it so they let it slide.

[1]: https://bikeshed.com/

> Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?

I don't see how this could be achieved.

Any widely-recognized benchmark is going to be gamed by the genAI companies.

They have a strong financial incentive to do so, and their products' nature shows that they are not influenced by ethical or societal-good incentives.

I dunno, on a subscription one would assume that minimizing token spend would actually be in their interest. Even for API calls I'm not entirely convinced they're profitable.
I think the model space is too competitive. People will switch if another model is significantly better.
There are only a few frontier models, and aren’t they all operating under the same incentives?
Open source models maybe not necessarily as they can (in theory) be self hosted.

I think right now the incentives of open source chinese model developers is to provide good (comparable to SotA) and cheap models so the space is not captured by a few private american companies because they've seen how hard it is to compete in the space when that happens.