| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by moyix 1382 days ago

The Authors Guild v Google decision about Google Books seems relevant:

> In late 2013, after the class action status was challenged, the District Court granted summary judgement in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court's summary judgement in October 2015, ruling Google's "project provides a public service without violating intellectual property law." The U.S. Supreme Court subsequently denied a petition to hear the case.

[...]

> The court's summary of its opinion is:

[...]

> Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

This doesn't touch on the ethics of course – at minimum I think allowing people to exclude themselves or their work from a dataset is necessary.

6 comments

VanTheBrand 1382 days ago

I would argue (as the court did) that google's use is transformative because the end result "book search" is in a different marketplace from "books." The end result / output of these generative AI systems trained on stock media and art is..."stock media and art."

That's kind of what this whole article is about. Just training the systems in research is arguably fair use but creating the entire pipeline might not be and the "loophole" here is trying to claim no responsibility for the training at the center of it because that was technically done by a 3rd party (...funded by the final creator of the full entire pipeline.)

pavlov 1382 days ago

The court’s summary also mentions this aspect of differing marketplaces:

“… the revelations [i.e. the information served by Google Book Search] do not provide a significant market substitute for the protected aspects of the originals.”

This doesn’t apply to AI image generators which are clearly a “market substitute” for the protected originals used to train the system. For this reason I’d expect someone like Getty to want to revisit Authors Guild v Google sooner rather than later.

moonchrome 1382 days ago

Can we first get an AI that's actually usefull as a Getty substitute? All I'm seeing posted is visually pleasing nonsense - as soon as I tried to use it for stuff like stock photo generator it's unusable (eg. key physical properties of the object would be off to the point where the object is useless, and in many cases it would look wrong even from a thumbnail).

The only thing I did see was designers cropping out the wrong part and filling in the blanks - I suppose it's competing with stock photos in that aspect.

gpderetta 1382 days ago

Did you try inpainting to fix the wrong bits? From my very little experience, AI image generation is not (yet) a one click process and requires multiple iterations to get close to the desired result, i.e. it is still more a tool for a designer than a replacement for it.

gpm 1382 days ago

AI image generators are clearly not a market substitute for images, they are a tool that can be used to create market substitutes, but not themselves one.

VanTheBrand 1382 days ago

Depends on what the product is. For example with openAI Dalle-2 the product is very clearly the generated image. You even pay per image. Also this is kind of what this article is about. Arbitrarily separating the pieces in order to evade copyright.

WithinReason 1382 days ago

Note the "protected aspects of the originals" part. AI generated images don't produce outputs that contain protected aspects.

VanTheBrand 1382 days ago

That’s for a court to decide, ultimately. Something doesn’t have to be a bit for bit copy to be a protected aspect.

russellbeattie 1382 days ago

An important part of the opinion (on the wiki page you linked to) is completely missing in the case of AI datasets:

> It generates new audiences and creates new sources of income for authors and publishers.

This is definitely not the case for artists and photographers, who don't benefit at all from the transformative nature of the AI output, and in fact are significantly harmed since it dilutes the uniqueness of their work by allowing anyone to imitate their style. Though to my knowledge "style" isn't protected by copyright - only trademark - I can't imagine there won't be lawsuits about this in the future.

That one artist who complained that people can't find his original work online now because of so many imitated pics is definitely exhibit A in terms of direct harm.

9wzYQbTYsAIc 1382 days ago

> the revelations do not provide a significant market substitute for the protected aspects of the originals

It does seem like generative AI systems provide a significant market substitute, so this ruling probably wouldn’t apply, in court.

edit: see https://news.ycombinator.com/item?id=33194623 for some initial thoughts on how this problem (and others) could be rectified.

For example, with a database of protected works and self-censorship algorithms for generative AI systems, conscientiously objecting creatives could have a mechanism for excluding their works.

tpmoney 1382 days ago

A substitute for what though? Copyright law is only concerned with substituting the work under copyright. That is to say, the consideration is whether the infringing aspects of the secondary work would alter the demand and market for the work being infringed.

In all the talk about AI data laundering there really hasn't been any indication that the AI generated item substitutes for the item it's alleged to infringe on. Substituting for a whole profession and its practitioners doesn't enter into the concerns of copyright law. There might be some argument that it should (to "promote the progress of science and useful arts" as it were), but copyright law to my knowledge hasn't been used to prevent new tech from putting professionals as a whole out of business.

9wzYQbTYsAIc 1382 days ago

Stock photography seems to be the obvious instance - why bother paying for the labor to make a stock photo, when you can have a generative AI system create the photo for you?

And furthermore, has anyone demonstrated that it is or is not possible to fully, or substantially, recreate any given existing work using the right input prompts?

I’m interested to know more of the legal details, but my understanding of copyright law is such that it preserves the value of intellectual labor.

edit: on a certain level, the cat is already out of the bag, but that doesn’t mean that we should ignore the law, without some indication from lawmakers or government that they intend to adjust said laws

tpmoney 1382 days ago

This is precisely my point though, "stock photography" isn't an individual work. Copyright law doesn't apply because you can't infringe on the copyright of "stock photography" as a whole any more than you can infringe on the copyright of "rap" or "rock and roll" or "oil paintings".

Further, just because a new tech can substitute for a class of old tech doesn't (often, barring protectionist laws) mean the old tech gets to impose legal restrictions on new tech. To trot out an obligatory car analogy, the rise of the automobile was not legally hampered by the fact that it substituted for the products of buggy and whip makers. More relevantly, the rise of photography and cameras was not legally hampered by the fact that it substituted for many painters products. The rise of stock photography itself wasn't legally hampered by the fact it substituted for the work of corporate artists. The rise of point and shoot cameras wasn't legally hampered because it substituted for the work of professional photographers.

Lastly if the argument is about that the tech makes it "possible to fully or substantially recreate any given existing work" using deliberate and specific inputs, well we've had plenty of legal precedent on that too. The same arguments were made about Xerox machines, about cassette tapes, about VCRs, about CD-Rs. The copyright holders pretty much lost in every case. At the point you are taking specific and deliberate actions to knowingly infringe on copyright is the point where the technology used is no longer relevant. The right inputs can be used to infringe on the copyright of Star Wars at any typewriter or computer keyboard in the world. The right inputs can be used to infringe on the copyrights of The Beatles at virtually any instrument. It is the act of infringing, not the technology, which is relevant here.

I believe in some companies, the copyright holders won a concession in the form of a tax levied against each CD-R and cassette tape sold to be distributed to the recording industy. One wonders how the authors of those countries felt about not getting a cut of every Xerox machine sold.

cycomanic 1382 days ago

> Lastly if the argument is about that the tech makes it "possible to fully or substantially recreate any given existing work" using deliberate and specific inputs, well we've had plenty of legal precedent on that too. The same arguments were made about Xerox machines, about cassette tapes, about VCRs, about CD-Rs. The copyright holders pretty much lost in every case. At the point you are taking specific and deliberate actions to knowingly infringe on copyright is the point where the technology used is no longer relevant. The right inputs can be used to infringe on the copyright of Star Wars at any typewriter or computer keyboard in the world. The right inputs can be used to infringe on the copyrights of The Beatles at virtually any instrument. It is the act of infringing, not the technology, which is relevant here

There is a significant difference here though, a Xerox machine or a VCR itself does not contain a representation of the art they are copying, a DL network does. I am pretty certain the cases around Xerox/VCRs etc would have had a pretty different outcome if you could type a prompt into your machine "print a story about some kids in a wizard college fighting against the comeback of an evil sorcerer" and it would have put out something closely resembling Harry Potter.

9wzYQbTYsAIc 1382 days ago

> This is precisely my point though, "stock photography" isn't an individual work … It is the act of infringing, not the technology, which is relevant here.

I completely agree.

See https://news.ycombinator.com/item?id=33241173 for my comments on that topic. (edit: self-censorship, for example, can be extended to generative AI systems)

> if the argument is about that the tech makes it "possible to fully or substantially recreate any given existing work" using deliberate and specific inputs, well we've had plenty of legal precedent on that too.

In this case, however, given that we are talking about a computer program, and as such, there are ways to legislate copyright protection (or other protections) without throwing the baby out with the bathwater.

See https://news.ycombinator.com/item?id=33194623 for some initial thoughts on that topic

numpad0 1382 days ago

So the value proposition is, it’s the exact same thing, but you won’t be paying for it because it’s not THE exact same thing?

9wzYQbTYsAIc 1382 days ago

As I understand it, that would be skirting the law and philosophical principles behind protectionism for intellectual labor.

If society doesn’t value commodity intellectual labor, then society may need to address the commoditization of intellectual labor, directly, through things like UBI / vocational rehabilitation, etc.

Similar arguments can be made about robots and the commoditization of manual labor.

dangerface 1382 days ago

> Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals.

So is digitizing a copyright vhs and hosting it via torrents also fair use? Its transformative, the public display of the video is limited, there is no market for vhs.

I don't get it whats the difference other than Google having deeper pockets than me?

authpor 1382 days ago

> I think allowing people to exclude themselves or their work from a dataset is necessary.

or they could open it all up for everybody and stop protecting the rights of death people (authors dead less then 70 years ago)

then again, that will make the publishers starve... but why pretend publishing corporations need food?

TigeriusKirk 1382 days ago

My personal ideal outcome is that there's no opting out of having your intellectual output included in the training, but the resulting model is as a result available freely to the public.

In my utopia, the end results are models containing the sum total of human output, available to everyone.

What I think is unconscionable is training the models on public works and then retaining them exclusively for private use.

noduerme 1382 days ago

Why pretend that other corporations that vacuum up content and repackage it have rights to resell art that you want to strip from the original publishers? At least the publishers actually made a contract with the artists.

visarga 1382 days ago

For the first time there is a chance for Mickey Mouse to be free, I mean "In-the-style-of-Mickey-Mouse", his new name. When did we ever get such a chance for information freedom?

9wzYQbTYsAIc 1382 days ago

The Renaissance comes to mind: https://en.m.wikipedia.org/wiki/Renaissance

ad404b8a372f2b9 1382 days ago

This is larger than publishers, this is every artist, film-maker, photographer, every writer, every engineer, anybody who has ever written or created something and shared it publicly is liable to have their work assimilated and an infinite amount of derivatives produced with no control over how they're used and by whom.

Comment generated with gpt-neox prompt: Comment about AI and data collection and generation and its pitfalls, expressing concern, emphasis on professions, emphasis on automation, written by Stephen King, creative writing, award winning, trending on reddit, trending on hacker news, written by Greg Rutkowski, written by Zola, written by Voltaire, written by authpor, written by moyix.

(Just kidding, it wasn't AI generated but you see my point.)

mkaic 1382 days ago

> anybody who has ever written or created something and shared it publicly is liable to have their work assimilated and an infinite amount of derivatives produced with no control over how they're used and by whom.

This has been the case ever since people started putting their art on the Internet publicly. The only difference is that now it's algorithms creating the derivatives, not people.

harry8 1382 days ago

Yeah before the internet it never happened and nobody knew just how damn cliched Bill Shakespeare's plays are. Every line of Hamlet's soliloquy! It's insane!

ROTMetro 1382 days ago

Would we have Shakespeare's plays if he didn't make money? Which encourages better plays:

I write a play, and I can license theater companies to be able to perform it. Therefor better writers are attracted to the industry (instead of to say Ad Copywriting) and because of a higher level product, the industry thrives.

I can write a passion play that the local theater will perform. I will not generate enough income to live from my product. I will not generate income from licensing my production because there is no copyright and my scripts would just get stolen/distributed freely. The industry has less quality productions. The majority of productions have no reputation of quality.

ad404b8a372f2b9 1382 days ago

This is not remotely the same, scale and barrier to entry matter. With stable diffusion I can pick any artist right now and create over 1000 derivative works by tomorrow morning in his style to the same degree of expertise with no training involved and no work required.

jstanley 1382 days ago

That's good!

Acting like it's a bad thing is just ludditery.

heavyset_go 1382 days ago

The Luddites weren't some cult of ignorant technophobes, they were highly-skilled middle class craftsmen and small business owners who went from being able to provide for their families to dying in utter destitution. The remainder of them were tried for machine breaking and were either executed by the state or exiled to penal colonies. They risked everything because everything was at stake, I have a hard time saying that their situation and outcomes were "good", and I have a hard time saying the same about similar situations that are playing out today.

ad404b8a372f2b9 1382 days ago

I wouldn't be so confident one way or another, this is too new. I think it's going to make a lot of things way more accessible and enable people to express their creative voice who couldn't before. On the other hand you're looking at the destruction of a lot of professions, and possibly overnight with the speed things are moving at. I think if we told every software engineer their skills were entirely obsolete and they had change career tomorrow the reception would be much colder.

I remember when I started working on generative models in 2015, you could barely generate a picture of a blurry 40x40 pixels face. Two years later 1024x1024 almost indistinguishable from reality. Now every week we have a new revolutionary application coming out.

9wzYQbTYsAIc 1382 days ago

I think that the argument, overall, is that there are questions as to the legality of certain applications of the technology.

Society needs to change the laws regarding the preservation of value of intellectual labor, as has long been suggested.

Acting like the law doesn’t matter is a bad thing, if we are making value judgements.

bigiain 1382 days ago

> The only difference is that now it's algorithms creating the derivatives, not people.

I see more of a difference than that.

I really don't care if a person makes a meme out of one of my Flickr photos I've shared publicly.

I'm much more grumpy at the idea that Facebook/Google/Microsoft using my Flickr photos and "giving away the AI automemes" as a way to further lock people into their walled gardens of surveillance capitalism.

(Not enough that I actually care enough to do anything about it. I have my Flickr account set to default to CC BY 2.0 for uploads, and I try reasonably hard to remember to lock that down to All Rights Reserved is I'm uploading pics of family or friends. But I don't lose sleep over any of it. I do sometime come across this pic of mine, which took on a life of it's own and is all over the internet, at least in coffee-related places, and wistfully wonder if I could have gotten more credit for it... https://flic.kr/p/sVHP9 )

authpor 1382 days ago

this is larger than the arts. anybody has ever participated creatively in our culture understands that it's absolute bullshit to pretend we need money in order to want to contribute artistically.

we need money because food is for sale, because most of us do not own where we live hence we are forced (a priori) to come up with a whole lot of money every month or else you're out in the streets.

ksidudwbw 1382 days ago

Learn to code! Oh wait...

ad404b8a372f2b9 1382 days ago

Sure but unless you bring down capitalism people will still need to work to eat and most will want to use their hard-earned creative skills to make a living.

Not only that but being able to dedicate 8 to 10 hours a day to your craft for 40 years bring it to a level that you can't reach with casual practice.

9wzYQbTYsAIc 1382 days ago

> Sure but unless you bring down capitalism people will still need to work to eat and most will want to use their hard-earned creative skills to make a living.

The concept of a UBI (universal basic income) isn’t inherently in conflict with capitalism. I believe that it is actually in coherence with the idea of Universal Human Rights, as defined by the UN in the 1940s.

Perhaps that would be the culmination of anything good about capitalism.

nine_k 1382 days ago

The problem is that UBI is in conflict with arithmetics. Short of near-total redistribution, it's impossible to provide a decent level of UBI for everyone. Total redistribution doesn't work, because economy needs markers as ways of price / demand discovery, and markets apparently lead to power-law distribution, not flat.

IMHO, the realistic option is a thick enough safety net for those who is going through a rough spot, for the disabled, etc, via both taxes and charity. But the vast majority will have to work, in one way or another, until machines completely take over, like in the Culture books by Ian Banks.

authpor 1382 days ago

capitalism cannot be brought down. this one must fall on its own.

I consider France one of the best examples of capitalism https://www.express.co.uk/news/world/1683661/paris-protests-...

echelon 1382 days ago

Do we allow artists to withhold their works from the minds of eager, learning children? [1]

Tell me how ML is different than the mind of a toddler ravenous for new information.

For every billion dollar start-up using data at scale, there are tens of thousands more researchers and hobbyists doing the exact same, producing wonderful results and advances.

If we stop this growth dead in the tracks, other countries more willing to look past the IP laws will jump ahead. And if Stability locks away their secret sauce, some new party will come and give away the keys to the kingdom yet again.

You can't block the signal. Except, of course, by legislating against it in some Luddite hope we can prevent the future from happening.

Instead of worrying careers will end, we should look at this as being the end of specialization. No longer do we need to pay 20,000 hours to learn one thing to the exclusion of all others we would like to try. Now we'll be able to clearly articulate ourselves with art, music, poetry. We'll become powerful beings of thought and expression.

Humans aren't the end or the peak of evolution. We should be excited to watch this unfold.

[1] Maybe Disney would like you to pay more for a premium learning plan for your child, but thankfully that's not (yet) possible.

greysphere 1382 days ago

Most machine learning is assigning weights in a chain of matrix multiplications and normalization functions.

There is no known experimentally verifyiable model of toddlers' brains, let alone one based on matrix multiplication and normalization. Developing such a model would be a noteworthy achievement.

Therefore these are different.

9wzYQbTYsAIc 1382 days ago

Some Artificial Neural Networks have been shown to significantly (at least up to 50% concordance) model brain function.

Not the mention the laborious work of neuroscientists to build out the connectome of the human brain.

greysphere 1382 days ago

Two systems that produce the same output for some set of inputs doesn't show the systems are the same. My phone can produce the same results as my brain for short arithmetic problems. My phone is not a brain.

The neuroscientists I know in the field would be among the first to tell you that our ability to model the brain is nearly non existent. In fact we don't even have a great model of a single neuron [1]. This statement doesn't invalidate the work folks are doing to try and reach that goal. Biology is hard.

[1] https://en.m.wikipedia.org/wiki/Biological_neuron_model

mattkrause 1382 days ago

As a working neuroscientist, I’ll co-sign this!

Understanding 50% of the brain, whatever that would even mean, is an utter fantasy.

9wzYQbTYsAIc 1382 days ago

I should have clarified that I was talking about the specific brain function of semantic comprehension.

I am not suggesting that we are anywhere near having a complete analytical model of 50% of the brain.

I am suggesting that we do have tools to continue answering questions about functional aspects of the brain.

Or am I missing something that indicates the non-utility of “function analysis” of biology-based artificial neural networks?

9wzYQbTYsAIc 1382 days ago

See https://news.ycombinator.com/item?id=33197285

orbifold 1382 days ago

These articles use far more cautious language than you suggest and if they don't everyone working in the field is hopefully aware that such claims are the academic equivalent of clickbait at best.

boppo1 1382 days ago

>No longer do we need to pay 20,000 hours to learn one thing to the exclusion of all others we would like to try. Now we'll be able to clearly articulate ourselves with art, music, poetry. We'll become powerful beings of thought and expression.

I'm a 20000 hours person. Knowing what I know about what I do, it's real sad to see someone misunderstand what goes into creativity this egregiously. Prompt engineering is such an unbelievably watered down "version" of making a painting. It's like writing a page, or even a folder! of bullet points and handing it to a ghostwriter, then telling them "put the end result between Shakespeare and Poe".

That's not unleashing your creative voice. Unleashing your voice and acquiring technical skills in a chosen field are the same. If you endlessly mixed all the prior classical works, it doesn't matter how you weight them, it won't spit out Mozart. You're stuck in the gamut of the model, between the maxima points of each artist.

It's an incredible tool to generate stuff quickly, and to some extent it will help artists whose work depends on quantity over quality.

IanCal 1382 days ago

You can prompt with images, which let's you control colour and composition, and with masking you can iteratively work on sections to guide the image to what you are picturing. That can shift the creative part more towards the user.

boppo1 1382 days ago

Yes, I've seen the photoshop plugin. You're comparing playing with duplo blocks to marble sculpture.

9wzYQbTYsAIc 1382 days ago

> Tell me how ML is different than the mind of a toddler ravenous for new information.

If a person published a work that clearly plagiarized or violated a patent, that person would be open to legal action.

I’m all for systemic change, but uses like this may end up having a chilling effect on human-created work.

XorNot 1382 days ago

> I’m all for systemic change, but uses like this may end up having a chilling effect on human-created work.

Everytime this comes up, whichever party fears for it's livelihood always says something like this and ignores the other side: that rigorous enforcement activity is going to do the same thing, to human created work. Richard Stallman wrote a short story about this very issue.[1]

There are already people hurling abuse around on Twitter at artists because they think that something they made was produced with Stable Diffusion or something else.

[1] https://www.gnu.org/philosophy/right-to-read.en.html

9wzYQbTYsAIc 1382 days ago

> Everytime this comes up, whichever party fears for it's livelihood always says something like this and ignores the other side: that rigorous enforcement activity is going to do the same thing, to human created work.

I may be providing a counter-example to your argument.

At this time, I’m not advocating for anything other than self-censorship by generative AI systems (see https://news.ycombinator.com/item?id=33194623 for some initial thoughts) and, as aggregated from some of my other comments in this thread, the following:

I think that it will be important to ensure that we have symmetric information, going forward, otherwise trying to put the genie back in the bottle may just end up further disadvantaging those that try to follow the rules.

-

Society needs to change the laws regarding the preservation of value of intellectual labor, as has long been suggested.

Acting like the law doesn’t matter is a bad thing, if we are making value judgements.

-

If society doesn’t value commodity intellectual labor, then society may need to address the commoditization of intellectual labor, directly, through things like UBI / vocational rehabilitation, etc.

Similar arguments can be made about robots and the commoditization of manual labor.

YurgenJurgensen 1382 days ago

Funny that you cite Stallman, when Copilot using GPLed code in closed-source projects is a real concern.

numpad0 1382 days ago

The criticism is that AI works are not transformative, but are recognizable “regurgitation” of training set.

It’s not that AIs are too good. They look like crude knockoff products to trained eyes. And crude knockoffs are usually considered bad things.

echelon 1382 days ago

"Good artists borrow, great artists steal."

A lot of artists get started with tracing before taking off the training wheels. You also see new art styles quickly proliferate across the entire community, so clearly there's some unspoken copying happening.

These models are producing new works in nearly identical styles. That's something a trained human could conceivably do.

lmm 1382 days ago

> A lot of artists get started with tracing before taking off the training wheels.

Sure, but only privately. Publishing something you traced is a massive no-no, and selling it even more so.

boppo1 1382 days ago

Picasso was a hack, and it's reflected in that quote of his.

numpad0 1382 days ago

Yeah, but when couple lines match up with existing arts, you go up in flames and you change the careers. NAI is doing the first half of that.

trention 1382 days ago

>Tell me how ML is different than the mind of a toddler ravenous for new information.

The toddler is human. AIs are not humans.

It's a human right to learn. Non-humans don't (and shouldn't) have human rights.

>Humans aren't the end or the peak of evolution. We should be excited to watch this unfold.

Spoken like a true evolutionary loser.

nightski 1382 days ago

Well a toddler isn’t making money off the information they are absorbing for one. If these are open to the public models that is one thing. But no, these are proprietary models whose sole purpose is to make money for large corporations.

echelon 1382 days ago

Artists and engineers do exactly this. It just takes a decade.

nightski 1382 days ago

They are taking your code verbatim and injecting it into numerous code bases around the world violating the license while getting paid for it?

_jal 1382 days ago

> Tell me how ML is different than the mind of a toddler ravenous for new information.

Well, I can't keep a toddler in a data center, pumping out work on demand. Or copyright it and limit who it chooses to work for when it grows up.

For instance.