|
|
|
|
|
by lexarflash8g
270 days ago
|
|
Just tested this on a rather simple issue. Basically it falls into rabbits holes just like the other models and tries to brute force fixes through overengineering through trial and error. It also says "your job should now pass" maybe after 10 prompts of roughly doing the same thing stuck in a thought loop. A GH actions pipeline was failing due to a CI job not having any source code files -- error was "No build system detected". Using Cursor agent with Sonnet 4.5, it would try to put dummy .JSON files and set parameters in the workflow YAML file to false, and even set parameters that don't exist. Simple solution was to just override the logic in the step to "Hello world" to get the job to pass. I don't understand why the models are so bad with simple thinking outside the box solutions? Its like a 170 iq savant who can't even ride public transporation. |
|
There is no outside the box in latent space. You want something a plain LLM can’t do by design - but it isn’t out of question that it can step outside of its universe by random chance during the inference process and thanks to in-context learning.