Hacker News new | ask | show | jobs
by slashtab 670 days ago
Meanwhile nothing for mega corporation pirating data to train AI.
2 comments

It’s not pirating. It’s transformative and fair use. Derivative even in some cases. Each piece of content is but a grain of sand on an island.

It’s called the open internet.

It's pirating when these scrapers are ignoring terms of user and whatever the AI equivalent to robots.txt is as has been widely reported as happening.
Key here would be transformation rather than reproduction?

Youtube is mentioned in the 2013 brief:

>b. According to the YouTube “Terms of Service,” users who upload content to YouTube retain all of their ownership rights in their content. By uploading their content to YouTube, however, such users grant YouTube a license to use, reproduce, and distribute such content. >

>c. In general, the further reproduction and distribution of videos that are taken from the Youtube.com platform violates the copyright of the individual who uploaded that video to Youtube.com.

I my country making a copy of copyrighted works is illegal unless you have permission from the copyright holder.

It’s impossible to use a copyrighted work to train without making a copy of it.

Solely making a copy isn't copyright infringement, otherwise your ISP, your browser cache, the CDNs providing data caching on the internet, your screen, your router, and about a million other components in the stream would need a license for each piece of data.

Infringing copyright requires far more than this.

And if the output is transformative, then they can read whatever public facing information they can find, just as you can.

If I steal loaf of bread from bakery to make bread statue what does the guy who delivered bread have to do with my act of thievery? Is he also a thief just because bread was transported in his van to the bakery?

What the heck are you going on about with isp, router, screen etc? Btw, have you heard about HTTPS and what it does, while we are at it?

Read the thread. The upvotes are because I did read the thread before replying.
So how do browser caches work then?
Same as the cart in the store. If you take it out of the cart and save it in the home folder, well. If you put a web server on the home folder (or cache folder), well well well...
What if I don't and just access the cache whether I want directly? What if I take a cached picture and print it and put it on my wall?

The bit copyright idea is so flawed it almost feels satire.

So how much of a book of poems can I assemble from other books of poems and spit out as an ebook on amazon before it is not "transformative and fair use"? Few words, sentence, chapter?
Syllables, honestly.
It's called the open internet when the same rule applies to everybody.

This is one rule for corporations and another rule for individuals.

Additionally, there is often with little or no recourse when the latter are falsely accused.

Look and the frustration of YouTubers evoking DMCA takedowns for including 5 seconds of a commercial song accidentally.

Training is not pirating, but generating copyrighted data is.
If generating copyrighted data pirating, then so is being served literal images that are shared across the internet.

Should a corporation be able to sue you for simply sharing an image of Micky?

There's a difference between a fair-use reproduction of Mickey and reproducing an image of Mickey that you claim is your own original creation (or there was until the copyright ran out recently).
You pirated all the books you have read, all the movies you have watched, and you're pirating this message right now.

Show me the money.