| The problem is I'm not doing anything as complicated as what you're describing. The task was/is to take a grammar for APL from some long forgotten paper and turn it into a lemon parser. Easy, peasy, well within its wheelhouse and it had spectacular initial results with the help of DeepSeek-R1 analyzing its work. "Oh, good job, robot," me types, "let's work on a lexer. Hmm... you seem to have clipped out some important rules at some point, we need to add those back." Then, boom, Claude is completely worthless. I want Claude to succeed. It was doing so well then it hit a self-reinforcing wall of failure that it just can't get over even though it can analyze its behavior and say exactly why it keeps failing. I mean, exactly zero people think the world needs an APL interpreter written by the robots but the point of the project is to see how far they can get without having a human write a single line of code. I know they have limitations and have no problem helping them work around them. But, alas, this project is shelved until the next big hype cycle. |