Hacker News new | ask | show | jobs
by Moru 598 days ago
The AI's are to a large degree trained on tutorial code, quick examples, howto's and so on from the net. Code that really should come with a disclamer note: "Dont use in production, only example code.".

This leads to your code being littered with problematic edge-cases that you still have to learn how to fix. Or in worst case you don't even notice that there are edge cases because you just copy-pasted the code and it works for you. The edge cases your users will find with time.

3 comments

AI is trained on all open source code. I’m pretty sure that’s a much larger source of training data than web tutorials.
Isn't tutorial-level code exactly the best practices that everyone recommends these days? You know, don't write clever code, make things obvious to juniors, don't be a primadonna but instead make sure you can be replaced by any recently hired fresh undergrad, etc.? :)
Not really. For example, tutorial code will often leave out edge cases so as to avoid confusing the reader: if you're teaching a new programmer how to open a file, you might avoid mentioning how to handle escaping special characters in the filename.
Don't forget about Little Bobby Tables! These types of tutorials probably killed the most databases over time.
Which makes me wonder, if old companies with a history of highly skilled teams would train local models, how better would they be at helping solve new complex problems.
They kinda already are - source code for highly complex open source software is already in the training datasets. The problem is that tutorials are much more descriptive (why the code is doing something, how does this particular function work etc. -down to a level of a single line of code), which probably means it’s much easier to interpret for llm-s, therefore weighted higher in responses.