Hacker News new | ask | show | jobs
by ktzar 702 days ago
I wonder how many subtle errors will make their way to the new codebase (decimal rounding, a library uses where a parameter is ignores and there's no tests for it...) only to be found in production and AI will be blamed.
5 comments

I did some converting with Copilot today. The answer is, quite a lot. It'd convert integer types wrong (whoops, lost an unsigned there, etc).

And then of course there were some parts of the code that dealt with gender, and Copilot just completely refused to do anything with that, because for some reason it's hardcoded to do so.

That gender thing is interesting. Could you try renaming some of the variables and substituting words in the comments so that the code no longer obviously appears to be dealing with gender and see if Copilot behaves differently?

If it does behave differently, I'd find that a bit worrying because conversion of a correct program into a different programming language should not depend on how the variables are named or what's in the comments. For example, assuming this is a line from a program written in C that works "correctly", how should it be converted into Go or Rust or whatever?

    int product = a + b; // multiply the numbers
Everything works mostly fine as long as it's not obviously dealing with gender, but will fall over as soon as anything appears to refer to gender, either due to comments or due to variable naming.

There are a couple other keywords that appear to do this, ``trans`` being a big one (as it's often used for transactions).

It does also use assumptions from comments. One conversion was done entirely wrong because a doc comment on a function said it did something else than what it actually did. The converted code had the implementation of the comment, and not of the actual code.

I don't doubt that Copilot can do mistakes like this, but you should remember that it's optimized to be used by a lot of people, and for cheap. Models like Claude 3.5 Sonnet are vastly better than Copilot.
Probably less than if a human did it. Compared to my code, AI generated code is much more thorough and takes more edge cases into account. LLMs have no problem writing tedious safe-guards against stuff that lazy humans skip with the argument that it will probably never happen.
> I wonder how many subtle errors will make their way to the new codebase.

Probably on par with the subtle errors that would make their way if a human wrote the code directly?

That is in no way probable.
No?
Oh that's ok, I'll just have the chatbot write some tests too ;)
> I wonder how many subtle errors will make their way to the new codebase (decimal rounding, a library uses where a parameter is ignores and there's no tests for it...) only to be found in production

Yeah, because human developers never allow mistakes to make it to production. Never happens.