| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by skilled 930 days ago

“We can steal people’s copyrighted content but we can’t let you see it for yourself.”

Outside of privacy (leaking PII), the above is likely the main reason. Someone could have invested a lump of money to scrape as much as they can and then go to town in the courts.

The terms that prohibit it are under “2. Usage Requirements” that restrict reverse engineering the underlying model structure.

1 comments

donpark 930 days ago

Leaking original data would expose the company to direct copyright violation lawsuits. Changing T&S is simplest way to stave the legal risk exposure, buying time to implement technical remedies.

As ridiculous as it may seem, they're doing the right thing.

link

gumballindie 930 days ago

I always find it amusing when criminals threaten legal action. Happens all the time. They steal your property then they cower behind their legal rights.

link

4gotunameagain 930 days ago

The right thing would've been to not train the model on data they do not own, or that they do not have permission to use.

link

namlem 930 days ago

Well I'm glad they did the wrong thing then

link

4gotunameagain 930 days ago

Are you also glad that microsoft - a behemoth - is profiting from your hard work by training their co-pilot using your code in github ?

Will you be glad when those systems are good enough to replace you, and they became so using your toil, for free ?

link

xkcd-sucks 930 days ago

I've had no problem using Microsoft's toil for free by downloading free windows ISOs all my life, so if they want to pirate my Github code it's not bad enough to care about. Besides the bad practices the model might internalize as a result that is

link

4gotunameagain 930 days ago

Then speak for yourself, because I have been using Linux as a daily driver for almost my whole life.

I do not want corporate behemoths to profit from my work for free. Period.

link

__loam 930 days ago

Be careful saying that kind of stuff here. People get really mad when you tell them their new toys aren't ethical.

link

gumballindie 930 days ago

I made it my mission to get the lot of them mad. There are plenty of legitimate ai companies out there but YC seems fond of those unethical, which explains the infusion of ip stealing startups on here and their simps.

link

__loam 930 days ago

Hell yeah.

link

dvfjsdhgfv 930 days ago

It will be interesting to see how it plays out. I can imagine Wiley, McGraw Hill, Pearson and other publishers[0] of educational content OpenAI used could sell the rights to their material to be used for training GPT, but the price would be high enough we would be paying $100/month instead of $20.

[0] Heck, they could even unite and found an LLM startup themselves training the models legally and making it available for users at various tiers.

link

catchnear4321 930 days ago

“don’t touch the unsecured, loaded firearm that is sitting on the counter, that might be stolen, maybe even got a body on it, don’t look too close, or you can be kicked out of the club for not following the rules”

so if this is what the right thing looks like…

link

malfist 930 days ago

> As ridiculous as it may seem, they're doing the right thing.

Making it against the rules to be able to prove their illegal behavior is not the right thing to do.

link