Hacker News new | ask | show | jobs
by scuff3d 97 days ago
Had the same problem with a Python project. Just for the hell of it I tried to have it implement a simple version of a proxy I've made in the past. What was finally produced "technically" worked, but it was a mess. It suppressed exceptions all over the place, it did weird shit with imports it couldn't get to work, and the way it managed connection state was bizarre.

It has a third year college students approach to "make it work". It can't take a step back and reevaluate a situation, or determine a new path forward, it just hammers away endlessly with whatever it's trying until it can technically be called "correct".

2 comments

When I benchmark LLMs on text adventures, they reason like four-year olds but have the worlds largest vocabulary and infinite patience. I'm not surprised this is how they approach programming too.
>It has a third year college students approach to "make it work". It can't take a step back and reevaluate a situation, or determine a new path forward, it just hammers away endlessly with whatever it's trying until it can technically be called "correct".

OH! Yeah I think this is the exact bad feeling I've gotten whenever I've tried testing these things before, except without clear and useful feedback like compiler error messages or something. I remember when I used to code/learn like that early on and...it's not fun now. I also don't think it's really solvable

Yeah it's really funny to watch. They'll get stuck in a specific method call or a specific import. Even if you tell them to read the docs. Doesn't matter if there's a better approach, or that method only exists for some obscure edge case, or the implementation runs counter to the design of the API, if the can hammer the round peg into the square hole, they'll do it.

They also just... Ignore shit. I have explicit rules in the repo I'm using an agent for right now, that day it is for planning and research only, that unless asked specifically it should not generate any code. It still tries to generate code 2 or 3 times a session.