Hacker News new | ask | show | jobs
by gibbitz 81 days ago
This is indicative of too much context. Remember these systems don't "think" they predict. If you think of the context as an insanely large map with shifting and duplicate keys and queries, the hallucinating and seeming loss of context makes sense. Find ways to reduce the context for better results. Reduce sample sizes, exclude unrelated repositories and code. Remember that more context results in more cost and when the AI investment money dries up, this will be untenable for developers.

If you can't reduce context it suggests the scope of your prompt is too large. The system doesn't "think" about the best solution to a prompt, it uses logic to determine what outputs you'll accept. So if you prompt do an online casino website with user accounts and logins, games, bank card processing, analytics, advertising networks etc., the Agent will require more context than just prompting for the login page.

So to answer the question, if my agent loses context, I feel like I've messed up.

1 comments

This is the first project where I've really let AI to do more than work on a single file at a time. The trouble is, there's no way for it to be useful without a fairly large context. When it runs out, it starts doing things that are actively destructive, yet very subtle and easy to miss at the same time. Mainly, it forgets the architecture. A couple days ago, it had a good handle on an a database table that I was writing side by side with an API that ran queries and did calculations on the data. I read the code it wrote for a particular API call, and didn't notice that it had started flipping the sign of one of the columns in a query, because it had misinterpreted the column name. A few minutes before that, it had written another query correctly, but from that point on it kept flipping the sign on that column. I only noticed after having it write several other queries and when it oddly mentioned in its "thinking" that X was Y-Z. Reading the thinking has been the main clue as to when it loses track, but if I didn't know exactly why X was Y+Z, the code built on that API would have given subtly inconsistent results that would have been very hard to trace.