Hacker News new | ask | show | jobs
by leobg 857 days ago
How can they license something that they didn’t author? Yes, they have TOS. But training generative AI wasn’t something that existed when ~99% of Reddit’s content was created, hence users could not possibly have consented to it. Besides, at least in Germany, TOS cannot contain regulations that are “surprising” or “unexpected”. Using my content to serve ads is one thing I might expect. But licensing it out for a fee to third parties? I don’t think so.
7 comments

> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

From the oldest version of their ToS[1]. This is unchanged in the newest versions even for the EEA[2]. It seems pretty clearly that whatever AI training is doing is covered by "use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display" in "media formats and channels now known or later developed anywhere in the world" (emphasis mine).

[1] https://www.redditinc.com/policies/user-agreement-october-15...

[2] https://www.redditinc.com/policies/user-agreement-february-1...

At least in Germany, such agreements are afaik invalid and without a severability clause, possibly all others too. Simply because something like copyright cannot be assigned in Germany. Secondly, there are ways to use Reddit without ever having agreed to the ToS.
That clause does not assign copyright. You explicitly keep your own copyright (in the previous clause, I didn't reproduce it above). You just grant them a license to use your content in the ways they listed.
how do you make a comment without creating an account which requires you to agree to the tos?
In Germany lopsided contracts clauses that surrender all rights are void. GDPR also gives you the right to recall all your data, so you should be able to delete your account and all your data.
This is (hopefully) the major difference between web 2.0 and Web3. In the latter, the goal is to build services where you actually own your content.

Remains to see if this actually can happen.

People have actually been doing stuff like this since way before the LLM thing, I've bought books containing collections of stories from websites.

Craigslist Confessional: A Collection of Secrets from Anonymous Strangers https://www.amazon.com/Craigslist-Confessional-Collection-Se...

PostSecret: Extraordinary Confessions from Ordinary Lives https://www.amazon.com/PostSecret-Extraordinary-Confessions-...

Stoned, Naked, and Looking in My Neighbor's Window: The Best Confessions from GroupHug.us - https://www.amazon.com/Stoned-Naked-Looking-Neighbors-Window... (actually a great book)

People can and will profit from things you do in life for free, I feel like we accepted that a very long time ago?

Perhaps it isn't legally a license deal, but rather unlimited scraping i.e. database access.

The AI company just trains their models on that and aren't creating derivative work in the legal sense.

That's not going to fly in any court.

"We didn't license it to them for the express purpose of training your model on this data, we only gave them database access for the express purpose of training their model on this data."

FWIW, they already have been licensing their data for years to social media management platforms (ie SproutSocial, Sprinklr)
Because it is reddit, one of the most vile companies on the planet.
Says the guy with the Reddit meme username on Hacker News.
> How can they license something that they didn’t author?

Capitalism in a nutshell

It’s an American company. Bullets didn’t exist when the 2nd amendment was created but they’re still protected as arms. Also, basic internet concept, if you do something on someone else’s property, it’s theirs. This site is also a source of training material for ML and has been for a very long time.
> Bullets didn’t exist when the 2nd amendment was created

Uh, what? Wikipedia dates the first bullet to the 13th century:

> Fire lance barrels made of metal appeared by 1276.[30] Earlier in 1259 a pellet wad that filled the barrel was recorded to have been used as a fire lance projectile, making it the first recorded bullet in history

https://en.wikipedia.org/wiki/Gun#Transition_to_true_guns

I think they’re referring to modern cartridge bullets, as opposed to little lead balls stored separately from the powder and whatever else is rammed down the barrel in preparation for the next short.

Indeed the first integrated cartridges were developed around 1808 https://en.m.wikipedia.org/wiki/Cartridge_(firearms)

Paper cartridges packaging one projectile with one shot's worth of powder were in widespread use in the 16h century.

https://en.wikipedia.org/wiki/Paper_cartridge

Nah nah, because, um, the foundin fathers were like dumb and that and couldn’t um, HIGH CAPACITY