Hacker News new | ask | show | jobs
by lexarflash8g 270 days ago
Just tested this on a rather simple issue. Basically it falls into rabbits holes just like the other models and tries to brute force fixes through overengineering through trial and error. It also says "your job should now pass" maybe after 10 prompts of roughly doing the same thing stuck in a thought loop.

A GH actions pipeline was failing due to a CI job not having any source code files -- error was "No build system detected". Using Cursor agent with Sonnet 4.5, it would try to put dummy .JSON files and set parameters in the workflow YAML file to false, and even set parameters that don't exist. Simple solution was to just override the logic in the step to "Hello world" to get the job to pass.

I don't understand why the models are so bad with simple thinking outside the box solutions? Its like a 170 iq savant who can't even ride public transporation.

2 comments

> why the models are so bad with simple thinking outside the box solutions

There is no outside the box in latent space. You want something a plain LLM can’t do by design - but it isn’t out of question that it can step outside of its universe by random chance during the inference process and thanks to in-context learning.

They're very good at things have been done a million times before. I use both Claude and Gemini and they are pretty terrible at writing any kind of Vulkan shader but really good for spitting out web pages and small bits of code here and there. For me that's enough to make them useful.