Hacker News new | ask | show | jobs
by ThomW 993 days ago
I think this is where all the AI stuff is headed.

Authors, artists, etc. are suing projects whose models just inhaled their work and generate derivative works and I'm surprised the studios aren't getting more involved with going after AI models that included their properties in them.

I can see this going the route of us having a Disney AI, Sony AI, Discovery AI, Amazon AI, etc. where you can generate stuff using models owned by the studios, but only those studios and any public domain stuff they suck in too.

9 comments

> I can see this going the route of us having a Disney AI, Sony AI, Discovery AI, Amazon AI, etc.

Exactly. These companies are probably hoping the artists win as many legal battles as they can, since the result will be that only big companies will be able to create useful AI models.

Creators with decent-sized portfolios will just train their own models.

> He told me the training process took about 2.5 hours on a GPU at Vast.ai, and cost less than $2.

This is a SaaS waiting to happen.

https://waxy.org/2022/11/invasive-diffusion-how-one-unwillin... This was on the front page last year.

That’s for a fine-tune of an existing SD model that has already trained on a mountain of data. Training from scratch requires a mountain of image data and a lot more compute, so you would need a 100% clean base model as well, but then yes it’s totally doable.
There are a couple existing pretrained SD models that use all CC0/public domain data. I think at this point they're still significantly lower quality than other popular models, but I'm sure that will improve over time.
What Getty's done I think is the most "artist" friendly version of this.

Presumably when photographers/artists submit their images to Getty they hand over full ownership, or otherwise pretty broad licensing agreement. If Getty's the rights holders for these images they can use that by training their own models.

In my mind, OpenAI/Midjourney/Stable Diffusion are the Napsters of generative AI. Adobe Firefly, and now Getty, are coming up with the iTunes Store/Spotify. For better or worse.

It's legal but not particularly friendly to hold someone to a prior agreement like "you can mostly do what you want with X" when what it's possible to do with X significantly changes.
Hey, at least Getty handed over some money as the basis for that claim, OpenAI et all are leaning on "our scraper brought it back" to maker their version of that exact claim.
I think it would be more accurate to describe this as the most Getty-friendly version.
I guess in the same way you could quibble about whether Spotify is artist friendly or record label friendly, when faced with torrenting.
I don't think there is any "quibble" involved in whether record labels are good or bad for artists. And claiming that the alternative is "torrenting" is really throwing in a huge strawman.

The alternative to Spotify and the traditional record labels are places like Bandcamp, where artists get a far more significant portion of the earned money. Any industry where 99% of the money goes to the middle man or service provider rather than the creator is an industry in dire need of disruption.

My limited understanding is that Getty is at least as bad as the music industry, if not much worse. No doubt Getty will make millions of dollars from their AI image generator, while the artists may get a few cents a year if they are lucky.

edit: also, re spotify, as far as I'm aware they do carry small and independent labels. Getty is comparable to the big music labels, not to spotify.

Anyone can, through a service such as DistroKid, publish their music on spotify without a label. You can even enter your own label name so yeah spotify very much seems pretty label agnostic. Not sure how much the big labels end up paying them at the end of the day though.
Very different -- Spotify significantly increased the amount of music people consume, and the amount of artists that can get easily compensated (listing on Spotify is much easier than releasing a CD).

In this case: Getty is going to be paying less overall (otherwise, why do it at all?), and they will pocket the extra margin.

Right? Isn't this effectively what the actor's guild was striking about just in photographer form?
> I can see this going the route of us having a Disney AI, Sony AI, Discovery AI, Amazon AI, etc. where you can generate stuff using models owned by the studios, but only those studios and any public domain stuff they suck in too.

As entertaining as it would be to have a model where you type "photo of an astronaut riding a horse" and it defaults to Buzz Lightyear riding the horse from Tangled, that's not really what people use these models for in the main.

You don't actually want derivative works. The interesting training data isn't Hollywood movies, it's all the junk people post on social media. What you want is thousands of generic pictures of astronauts and thousands of generic pictures of horses. Pictures of Tom Hanks as Jim Lovell aren't any better (and may actually be worse) than actual public domain photos from NASA.

Almost all my test queries have been “homer Simpson eating a donut”

I think the vast majority of this is going to replace google image search, the way it replaced clip-art

> I'm surprised the studios aren't getting more involved with going after AI models that included their properties in them.

If you're the first group to sue you have to spend millions extra on lawyers to establish the precedent. I don't see much reason for Disney and friends to rush to be first.

The current state of AI isn't really a threat to Disney and friends, just suggestive of a future threat. No need to rush on that account either. Especially since if they win the lawsuit - they're still going to benefit from all the R&D on neural networks that is happening on other peoples dime right now.

And all in, do studios stand to lose more or gain more to AI? Drastically cutting their costs might be worth some extra competition.

It's a lose-lose for the studios. If they win on the IP side, all of the generative content of the future will exclude their IP. If they lose, then they don't get paid for it, or worse they have to pay for it.

Most people seem to have a difficult time grasping that if the model is trained in such a way that Disney's IP is excluded, the model doesn't know Disney even exists. Consider if every Disney website was excluded from Google and every Disney related trademark was blacklisted.

As of mid 2023 it is very clear personal assistants are going to replace traditional web search. In my use case, they've replaced it 100%: excluding searching a single website or product name. Will these generative assistants -- which must be capable of both processing and generating images -- know the rights holder's IP even exists? The idea that a useful LLM is going to be trained on 100% public domain, copyright free data is absurd.

IP owners are suggesting that future personal assistants would need to pay them for the knowledge of even the mere existence of their IP. That would be like if every website indexed by Google and every trademarked keyword required Google to pay the copyright owner per search. That isn't possible. To the contrary, the reverse occurs.

What if the future is in fact the opposite of what IP owners are now suggesting? Nike pays the generative AI company for Nike shoes to appear in the one-off movie generated for a single viewer based on their personal preferences?

The world is drowning in IP: text, video, music. The future will be a deluge.

The other angle on ‘Corporate AI’ is when we’ll start to see product placement and adverts inside generated content. Create an image of coffee, and you’ll find Starbucks logos everywhere. Ask an LLM about a topic and see it work in an advert about a particular brand of beer. I’m sure people are working on this already, but I really hope it never happens.
Oh god yeah that sounds horrible. If this happens, peopl will just hold onto the endless stable diffusion base models we've got now and never update anything.
That sounds like a fine way to encourage the success of offshore-hosted, pirate/gray-market AI image generators.
Just like how every time streaming gets more convoluted more people turn to torrenting. These companies manage to find ways to hurt themselves while thinking they're forcing people into paying them more.
Does not seem like its hurting Netflix / Disney all that much. If I had a 100 million subscribers paying me $100/year, I'm pretty sure I'd be crying into my caviar.

Admittedly, they're both nonsensically incompetent at times, and if you believe Disney's financials (do you believe any Hollywood financials?) then they somehow manage to lose money on $10,000,000,000 / year. I'm still not sure how you make the equivalent of 10's of 1,000's of tv episode costs ... and then lose money ... with Disney's back catalogue and vault.

+1 - large scale owners of IP should all be working towards this, it just makes sense. This is also a big point of contention with the Writers and Actors guilds, they understand this dynamic very clearly.

There was this topic yesterday related to legal data https://news.ycombinator.com/item?id=37627129

I've been saying all year that LexisNexis absolutely HAS to be building a training corpus. It was a big deal when that dumb lawyer in TX filed a brief with bogus citations, but what if you had an LLM that was specifically trained on legal filings, understood rules like "If you cite a case, you MUST include a reference to that case and it must be valid in our system"?

agree - it seems also that they are going to build it, whether customers want it, or regulators object
Brian Eno's dream is finally coming true.
And/or Distracted Boyfriend's worst nightmare