| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by int_19h 422 days ago

I used to use them mostly in "smart code completion" mode myself until very recently. But with all the AI IDEs adding agentic mode, I was curious to see how well that fares if I let it drive.

And we aren't talking about trivial bugs here. For TypeScript, the most impressive bug it handled to date was an async race condition due to missing await causing a property to be overwritten with invalid value. For that one I actually had to do some manual debugging and tell it what I observed, but given that info, it was able to locate the problem in the code all by itself and fix it correctly and come up with a way to test it as well.

For C++, the codebase in question was gdb, the bug was a test issue, and it correctly found problematic code based solely on the test log (but I had to prod it a bit in the right direction for the fix).

I should note that this is Gemini Pro 2.5 specifically. When I tried Google's models previously (for all kinds of tasks), I was very unimpressed - it was noticeably worse than other SOTA models, so I was very skeptical going into this. Indeed, I started with Sonnet precisely because my past experience indicated that it was the best option, and I only tried Gemini after Sonnet fumbled.

1 comments

Asraelite 422 days ago

I use it for basically everything I can, not just code completion, including end-to-end bug fixes when it makes sense. But most of the time even the current Gemini and Claude models fail with the hard things.

It might be because most bugs that you would encounter in other languages don't occur in the first place in Rust because of the stronger type system. The race condition one you mentioned wouldn't be possible for example. If something like that would occur, it's a compiler error and the AI fixes it while still in the initial implementation stage by looking at the linter errors. I also put a lot of effort into trying to use coding patterns that do as much validation as possible within the type system. So in the end all that's left are the more difficult bugs where a human is needed to assist (for now at least, I'm confident that the models are only going to get better).

int_19h 422 days ago

Race conditions can span across processes (think async process communication).

That said I do wonder if the problems you're seeing are simply because there isn't that much Rust in the training set for the models - because, well, there's relatively little of it overall when you compare it to something like C++ or JS.