Hacker News new | ask | show | jobs
by pierat 816 days ago
Wew! the big problem isnt just with pirate videos...

"According to local news reports, the service generated R$4,542,034 ($912,000) in revenue over twelve months"

This wasnt some torrent stream for "anyone for free have at!", but a paywall for content.

Amongst old pirates, charging money is the big "DONT DO, EVER" flag. And with absolutely no money or goods trading hands, is "cleaner", than groups that gatekeep piracy with money.

-------------

At least in the USA, we need some sort of mandatory video reproduction similar to ASCAP/BMI/SESAC. I just count how many times it was played, and send in a royalty check.

If that was the case, I could create really cool themed setups, share with friends, take some money, and pay royalties on playback. For music, I *can* do that. But for video? Nooooooope.

And because that doesnt exist for video, means we have like what, 50 different streaming (pile of crap) services, and back to cable channel hell.

4 comments

Oh no, he almost made a million exploiting others their copyrighted content. Unacceptable!

Now, what shall we do about OpenAI stealing immeasurable amounts of copyrighted data? Oh, nothing? Because it's fine to break the law if you're doing it under the guise of a company, but if it's an individual their life should be ruined forever? Makes sense!

If ChatGPT verbatim reproduces copyright content in a way that isn't defendable as fair use, OpenAI can be sued. Of course, that would be just one instance of copyright violation, which isn't that big of a deal, so in order for the rights holders to make it sting they'd have to prove that a very large number of people were prompting in this very specific way with the intent of piracy.

On the flip side, it wouldn't be hard to put guardrails on chatgpt output so that if too large a percentage of an answer is verbatim, it's blocked.

> If ChatGPT verbatim reproduces

Copyright covers "derivative works." Verbatim is absolutely not a requirement for infringement.

If you take a copyrighted image and modify it, even to the point where it's unrecognizable, if the image is being used in the same way (i.e., isn't a "transformative use"), then it's still a derivative work.

Yes, you are likely to get away with it if you're not caught. But that doesn't mean what you're doing is considered fair use, just that you won't get sued.

Thing is, every piece of text generated by ChatGPT is incrementally using every character of training data. So legally speaking, everything it produces is arguably a derivative work of ALL of the training data.

Generative AI isn't even a legal gray area; under current law, there's no blanket exception for "how much" of a copyrighted work is used. At best there's a fair use _guideline_ that lists, as one of four criteria, the amount and nature of the copyrighted work used. But really it's the entirety of millions of copyrighted works being used to generate the models, and those works _can_ be reproduced verbatim in many cases, proving that the works are encoded into the model.

Generative AI is only permitted because there's big money behind it along with associated lobbyists. And there are many in-flight lawsuits trying to shut down both GPT and various art-generating AIs.

Maybe they'll change the law. Maybe courts will side with the AI companies. But until then, it seems obvious to me that anyone arguing that generative AI based on models built with copyrighted works is completely legal is using motivated reasoning.

I understand OpenAI is a US company, but this is a US-centric view. This is especially since TFA is about a Brazillian operation.

> under current law, there's no blanket exception for "how much" of a copyrighted work is used

Under fair dealing laws, there are. [1] Though, as always, if commercial fan art is legal, then so should something that uses only a couple bytes of information per work, bar overfits.

> But until then, it seems obvious to me that anyone arguing that generative AI based on models built with copyrighted works is completely legal is using motivated reasoning.

It is completely legal in the EU, Japan, South Korea and Singapore. [2]

[1] https://libhelp.ncl.ac.uk/faq/43267

[2] https://www.reedsmith.com/en/perspectives/ai-in-entertainmen...

Your link re: Fair Dealing guidelines does NOT make it 100% legal. For one, the ENTIRE works are encoded into the model--not a part of them. For another, those are just guidelines, not explicit exceptions, just like Fair Use in the US. It's all very hand-wavy, even more so in the UK, apparently, so there's no way you can list those guidelines and say that anything is clearly allowed.

Your second link means it's legal for them to CREATE THE MODEL. This is true in the US as well: The model is a clearly transformative use of the data.

But as soon as the model produces works in the same use category as the original work (code -> model -> code, for instance, or image -> model -> image), it is no longer transformative.

If you understand the law and the technology, it's clearly generating derivative works.

Entire works are encoded in the model in the same way that if I cut up a document into individual words and put it in a bag with a bunch of other documents, if I was a no life loser I could spend a long time "recreating" the document from individual words. The bag of cutout words is NOT copyright violation though.
I'm wary of how hard the law is likely to stick to e.g. "verbatim," which is to say, it implies that there is a meaningfully "creative" step that the computer is doing for purposes of escaping "infringement?"

Let's say I take a copyrighted picture, make it into a jigsaw puzzle and leave out a few pieces; I can't reproduce the original, but that's still certainly infringement.

If the correct assembly of the jigsaw puzzle was not hte original picture, but a rearrangement of the original picture with arguable satire/social commentary (such as people's heads being where their crotches should be) then that becomes fair use.
No dispute there; and this will get us to what will be the ultimate question: An AI thing, or a human, could both end up with a result that looks like what you're describing. The question will be -- should these two things be seen as different, legally?

I'm fairly certain they should be seen as different, on their face, from a public policy point of view, i.e. I'm presently very comfortable ducking the question of "can they think," and for the present, assume they do not -- otherwise you're essentially saying that non human AI tools are "humans" for the purpose of copyright infringement.

> On the flip side, it wouldn't be hard to put guardrails on chatgpt output so that if too large a percentage of an answer is verbatim, it's blocked.

It wouldn't be hard conceptually, but it would be a copyright violation unless OpenAI could establish a novel kind of fair use distinct from the AI training fair use they rely on for ChatGPT not to ve a copyright violation no matter what output it produces, since what it would involve is building a database that is a mechanical cooy of all the copyright-protected works in ChatGPTs training set, and integrating it as part of the commercial ChatGPT product, and consulting it using some kind fof full-text search each generation from ChatGPT to verify that no passage of sufficient length was reproduced verbatim.

Not necessarily. Youtube has fingerprints of copyright works for this exact purpose, and it works fine.
Youtube Content ID is based on a specific agreement with the individual content owner that permits the specific use. Which works for Youtube because its for UGC, not content Youtube generates.
What we need is traceability from learning data to final output in AIs and cite those as a source/store in the metadata for the produced output. That way there is no question as to what works were consulted, and people can check to see if their copyrighted work was used without permission by the LLM/diffusion model.

I understand this is a hard problem, but lots of tech needs to solve hard problems, and if AI was anything but a plot by the billionaire class to obsolete the need to pay professionals to do work, it would be required. As the people in power benefit from it, nothing will probably be done on this front.

The difference is who the stakeholders are.

If the stakeholder has lawyers then it's prison sentences, if the stakeholder is millions of randos then it's all cool.

If I read 30 stories in the NYTimes about Nixon and write a paragraph using said knowledge, its not copyright infringement. So why do you think it is when OpenAI writes it?

If I look at 1000 photos of apples from iStockPhoto and draw a picture of an apple, its not copyright infringement. So why do you think it is when OpenAI draws it?

Reasonably sure neither of those would be copyright infringements in any case.

Drawings of apples cannot be copyrighted, copyright requires at least some form of originality...

And this is the one that might differ based on jurisdiction but writing about a public figure cannot be copyrighted either, specially if the knowledge was made publically available. Now assume I've covered every possible if an maybe because there are a billion of them for every country.

Before getting into all the legal technicalities: You are human, ChatGPT is not. That gives you rights, and ChatGPT none.
I'm a fairly hardcore pirate, but as soon as money starts changing hands, I'm out. You don't need to be a lawyer to see the trouble that's going to bring...

My favorite part is the streaming operators who are so very confident they'll never be caught because they have some fancy tech stack they think makes them technically immune to prosecution

they're actually providing a service. It is easy to consume pirated content, but have it curated, listed and streamed directly to your tv adds value, so people are willing to pay for it.
The article does not list the organizational's internal operating expenses. I can think of two possibilities:

The Good

The pirate was a Fully Automated Communist. The service was consuming $75,000 a month in hosting, and turning zero of that into profit. All staffers were unpaid volunteers working to sabotage mass media profits.

This is the zero marginal cost model for sabotaging profits. I salute any pirate who uses it. If this was the case then I wish I could volunteer to serve their jail time for them. This person is a hero.

The Sad

This pirate was drawing a profit and personal income from the revenue stream. Then I still salute their ability to scale a service and meet user needs. But I hope the time they now serve will harden them rather than break them, and awaken them to the cause and need for post-profit piracy models.

Perhaps we can send this person a letter, to outreach our support and hopes to them.