| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CapsAdmin 1142 days ago

Your polaroid example would require someone to write code that does that one specific thing. You could also argue that this would violate copyright if it was trained on some photographer's specific unique style, made as an app and marketed as being able to mimic the photographer's style. But in your example you have 1000 random polaroid images of unknown origin, so somehow it becomes abstract enough that it doesn't become an issue.

In your stephen king example I would say it's still learned, because the "code" is a general language model that can learn anything. It's just you decided to only train it on stephen king novels. If you have an image model that trained 100% on public domain images and finetune it to replicate a specific artist's style I would personally think the finetuned model and its creator is maybe violating copyright.

But when it comes to learning I would say when you write a program whose purpose is to learn the next word or pixel, but it's up to the computer to figure out how to do that, the computer is learning when you feed it input data. It's the program's job to figure out the best way to predict, not the programmer. (it's not that black and white given that the programmer will also sometimes guide the program, but you get the idea)

When you write a program that does one or several things, it's not learning.

I think it's something to do with the difference between emergent behavior from simple rules and intentional behavior from complex rules.

1 comments

flumpcakes 1142 days ago

I think you're using fancy language like "general language model" to obscure the facts.

If I created a program to read words from the input and assign weights based on previous words, I could feed in any data. Just like the polaroid example. (I suggested that the polaroid example was abstract enough not to be an ethical/legal problem because I believe it is mostly transformative, unless the colours themselves were copyrighted or a distinct enough work in themselves.)

Now If I only feed in Stephen King books and let it run, suddenly it outputs phrases, wording, place names, character names, adjectives all from Stephen King's repertoire. Is this a 'general language model'? Should this by copyright exempt? I don't think this is transformative enough at all. I've just mangled copyrighted works together, probably not enough to stand-up against a copyright claim.

I think people use AI and ML as buzzwords to try and obfuscate what's actually happening. If we were talking about AI and ML that doesn't need training on any licensed or copyrighted work (including 'public domain') then we can have a different conversation, but at the moment it's obscured copyright theft.

CapsAdmin 1141 days ago

I can agree it's obscure in the sense that we shrug when asked about how it works. If you specifically train a model to mimic a specific style I can get behind it leaning more towards theft, or at least being immoral regardless of laws.

If you train a model to replicate 10000 specific artists, I could also get behind it being more like theft.

But if the intention was to train with random data (and some of it could be copyrighted) just like your polaroid example to generate anything you want, I'm not so sure anymore.

I feel the intent is the most important part here. But then again I don't know the intent behind these companies, and I guess you don't either. Maybe no single person working in these companies know the intent either.

It also gets murky when you have prompts that can refer to specific artists and when people who use the models explicitly try to copy an artists style. In the case of stable diffusion, if the CEO's to be believed the clip model had learned to associate images of greg ruktowski and other artists to images that were not theirs but in a similar style[0]

Even murkier is when you have a base model trained on public data, but people finetune at home to replicate some specific artist's style.

[0] https://twitter.com/EMostaque/status/1571634871084236801