Hacker News new | ask | show | jobs
by lairv 1258 days ago
I feel like people on HN have different opinions about AI training when they are concerned (code, blogs) rather than when other are concerned (art). I've seen post saying it was normal for AI to train on scrapped art data, but offended when AI trains on their github code

My opinion is that if something is publicly available on the internet, then an AI should be allowed to train on it

3 comments

My concern is not so much the ethics, more the security issues.

The AI doesn't grok the code, it just copies it. This is fine for art, because accuracy isn't required. It's not fine for code, because code needs to be accurate.

Generating code using an AI is going to lead to vulnerabilities which were either present in the original training code, or have been created by mis-applying training code.

Granted, junior devs (hell, even senior devs) can and will make the same mistakes, but at least someone understands the code and can fix the vulnerability relatively easily once it has been exposed. The AI doesn't understand that it made a mistake, and has no idea how to fix it.

True to some extent although I can see some reasonable arguments as to why. For one, I don't expect AI art to ever replace humans for higher end creative expression - quite simply, part of the value of art is the time put in by a skilled craftsman, like the way a hand made Rolex is worth more watches that are more accurate but mad produced. Similar to Pixar and animation, I believe AI may change the style of art but not the fundamental demand for it.

Coding is far more utilitarian - so if an AI is taking others' original ideas and effectively passing them off as it's own, companies become less likely to feel they need to employ the originator.

I get your point, but if your code was already publicly available, then other humans could already take your original idea by reading your code and effectively passing them as their own, it's true that AI makes this process faster and more automatic
Other humans could but, if they did so in violation of the terms of the license, I could sue them. Microsoft acknowledged that copilot ignores the license altogether.
When a human does this today, it's recognized as plagiarism and possibly copyright infringement or even fraud. Why does 'AI-washing' make it alright?
> I feel like people on HN have different opinions about AI training when they are concerned (code, blogs) rather than when other are concerned (art). I've seen post saying it was normal for AI to train on scrapped art data, but offended when AI trains on their github code

Are you sure it was the same individual? Because otherwise all you've observed is two people with two different opinions.

Yeah it's only the impression I had reading AI related comments for the past few months, and it might be just opinions from 2 separated groups of people