Hacker News new | ask | show | jobs
by iroh2727 1339 days ago
+1. And let's not forget too that "AI", that is, ML models, are not "autonomous" in the way that humans are autonomous. Sure, we use the word "learn" to describe what they do, which is one word that we also use to describe what people do. But ML models are always wielded by people or corporations for particular purposes.

If a corporation was to directly publish some copy that appears plagiarized, we'd call that plagiarism. I don't see how adding a piece of code—one that's fully created, owned, and wielded by the corporation—as an intermediary changes anything. If anything, it looks like plagiarism-as-a-service, which seems worse (at least to my eyes).

Of course, this matter is a bit confusing. Because, for example, (1) it's not always plagiarism, (2) defining what exactly is plagiarism even in the purely non-technological realm is difficult (and likely somewhat subjective), and (3) there is a lot of corporate marketing which suggests this "AI" is "autonomous" (presumably to distract from who exactly is autonomous in this picture). And of course ML art is quite useful for many things. But I mean, so are artists.

Not long ago, a lot of Silicon Valley rhetoric was that the purpose of "technology" was to free up time so that people could be more incentivized to "do what people love to do" like, for example, artistic creation. But now it seems that rhetoric was just that: rhetoric, or what was needed to be believed/said at the time.

And now at our present time, when technological "progress" has been followed a bit further (that is, when we've developed our machinery a bit further under the incentives of our present economic system), much rhetoric has conveniently shifted to something else, something largely contradictory, but again precisely to what is needed to be believed/said to continue following the same incentive structure.

4 comments

A lot of really good points.

>Sure, we use the word "learn" to describe what they do, which is one word that we also use to describe what people do. But ML models are always wielded by people or corporations for particular purposes.

This is extremely important. "Learning" in machine learning is an aspirational label, not a descriptive one. People who claim otherwise either drank too much of their own Kool-Aid or are simply dishonest. This isn't just "wrong" in some taxonomical sense, this is dangerous in a very practical way. Conflating machine "learning" and human learning will inevitably lead to various kinds of sabotage of human learning.

I mean, at what point will this change? When the AI has to first be trained by being in a robot in the physical world for 10 years learning human concepts before it can start looking at art in the ultimate goal of learning how to draw?
The main reason AI will be reproducing copyrighted works while the original license is not trivial to identify will be that in those instances, humans are already violating copyright at a high rate. It's just flown under the radar thus far as required machinery to so easily surface violations was not available.

Copilot is capable of going beyond retrieval and is competent at using variables, comments, types and local context to infer intention and generate appropriate code and even comment on it. Whenever copilot correctly predicts code of yours that's a novel combination of concepts, copilot has originated novel code.

For esoteric concepts, you usually already have to know how to prime it but Copilot is especially useful when it helps you bump into things you didn't know you didn't know (one way to increase the odds of this happening is to write out your thinking so far in markdown or comments. You'd be surprised how helpful and clever Copilot can be in some instances). My point here is Github isn't charging $10/month for run of the mill retrieval. My opinion is code-gen LLMs contribute value and more open versions are worth building.

Indeed, the "learning". To my mind, the most simple (but still speculative) explanation of the "learning" phenomena - working examples and limitations / failures - we see is that the large models implicitly memorize the training inputs (or some derived features that can be used to approximately reconstruct the inputs) and then do something between interpolation and rather simple non-parametric learning. The effect is outputs are basically a somewhat sensical agglomeration of copy-pasted" snippets.

That said I think the results are often useful and sometimes fascinating. We should not fool ourselves about the learning that these large neural nets do, though.

There is a common argument that "human just a better neural network", I don't know, did they mean we need to give GPT-3 or Dalle basic human rights?
I am not sure human rights fit the case, but it is one of the most remarkable developments. It is a self replicating distillation of our culture. If our human-based culture grew up and had a baby-culture...
yes, this should happen, or we just build a new slavery system. but before this happen, using this argument to escape from question about copyright is dishonest.