Hacker News new | ask | show | jobs
by nightpool 460 days ago
Yes, you're allowed to make personal copies of copyright works that you own. IANAL, but my understanding is that if you're using them for yourself, and you're not prevented from doing so by some sort of EULA or DRM, there's nothing in copyright law preventing you from e.g. photocopying a book and keeping a copy at home, as long as you don't distribute it. The test case here has always been CDs—you're allowed to make copies of CDs you legally own and keep one at home and one in your car.
3 comments

> Yes, you're allowed to make personal copies of copyright works that you own.

That’s not the point. It’s about books you don’t own. Are you allowed to download books from Z-Library, Sci-Hub etc. because you want to learn?

To the best of my knowledge, no individual has ever been sued or prosecuted specifically for downloading books. As long as you're not massively sharing them with others, it's not an issue in practice. Enjoy your reading and learning.
Aaron Swartz, cofounder of Reddit and inventor of RSS and Markdown, was hounded to death by an overzealous prosecutor for downloading articles from JSTOR, with the intent to learn from them. He was charged with over a million dollars in fines and could have faced 35 years in prison.

He and Sam Altman were in the same YC class. OpenAI is doing the same thing at a larger scale, and their technology actually reproduces and distributes copyrighted material. It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.

https://flaminghydra.com/sam-altman-and-aaron-swartz-saw-the... https://en.wikipedia.org/wiki/Aaron_Swartz

I'm familiar with Aaron Swartz's case, and that is actually why I phrased it as "books". In any case, while tragic, Swartz wasn't prosecuted for copyright infringement, but rather for wire fraud and computer fraud due to the manner in which he bypassed protections in MIT's network and the JSTOR API. This wouldn't have been an issue if he downloaded the articles from a source that freely shared them, like sci-hub.
It would be incredibly naive to assume that the scraping done for these models did not at any point circumvent protections.

The fundamental contention is that both accessed, saved and distributed material that they didn't have a "right" to access, save, and distribute. One was made a billionaire for it and another was driven to suicide. It's not tragic, it's societal malpractice.

Will what OpenAI & others serve as precedent for Alexandra Elbakyan of SciHub and avenge Aaron?

Cynically, I imagine it will not but I hope that it could.

You could argue that they are avenging him in doing exactly what he did, or worse, and not being punished for it. They are establishing precedent.
I'm responding specifically to this sentence:

> It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.

Scraping the Internet is generally very different from piracy. You are given a limited right to that data when you access it, and you can make local copies. if further use does something sufficiently non-copying, then creator rights aren't being infringed.

Can you compress the internet including copyrighted material and then sell access to it?

At what percentage of lossy compression it becomes infringement?

> Can you compress the internet including copyrighted material and then sell access to it?

Define access?

If you mean sending out the compressed copy, generally no. For things people normally call compression.

If you want to run a search engine, then you should be fine.

> At what percentage of lossy compression it becomes infringement?

It would have to be very very lossy.

But some AI stuff is. For example there are image models with fewer parameters than source images. Those are, by and large, not able to store enough data to infringe with. (Copying can creep in with images that have multiple versions, but that's a small sliver of the data.)

When you identify where the infringing party has stored the source material in their artifact.{zip,pdf,safetensor,connectome,etc}. In ML, this discovery stage is called "mechanistic interpretability", and in humans it's called "illegal."
It was overzealous prosecution of the breaking into a closet to wire up some ethernet cables to gain access to the materials

Not the downloading with intent

And apparently the most controversial take on this community is the observation that many people would have done the trial, plea and time, regardless of how overzealous the prosecution was

> breaking into a closet

"The closet's door was kept unlocked, according to press reports"

When's the last time a kid with no record, a research fellow at Harvard, got threatened with 35 years for a simple B&E?

They threaten

Its the plea or sentencing where that stuff gets taken into account for a reduction to community service

Wasn’t John Gruber the inventor of Markdown?
> for downloading articles from JSTOR, with the intent to learn from them

For context, according to sources, he downloaded 4.8 million articles.

Maybe he was about to train an LLM on them /s
35 years is a press release sentence. The way DOJ calculates sentences when they write press releases ignores the alleged facts of the particular case and just uses for each charge the theoretically maximum possible sentence that someone could get for that charge.

To actually get that maximum typically requires things like the person is a repeat offender, drug dealing was involved, people were physically harmed, it involved organized crime, it involved terrorism, a large amount of money was involved, or other things that make it an unusual big and serious crime.

The DOJ knows exactly what they are alleging the defendant did. They could easily looks at the various factors that affect sentencing for the charge and see which apply to that case and come up with a realistic number but that doesn't make it sound as impressive in the press release.

Another thing that inflates the numbers in the press releases is that defendants are often charged with several related charges. For many crimes there are groups of related charges that for sentencing get merged. If you are charged with say 3 charges from the same group and convicted on all you are only sentenced for whichever one of them has the longest sentence.

If you've got 3 charges from such a group in the press release the DOJ might just take the completely bogus maximum for each as described above and just add those 3 together.

Here's a good article on DOJ's ridiculous sentence numbers [1].

Here's a couple of articles from an expert in this area of law that looks specifically at what Swartz was charged with and what kind of sentence he was actually looking at [2][3].

Why do you think Swartz was downloading the articles to learn from them? As far as I've seen know one knows for sure what he was intending.

If he wanted to learn from JSTOR articles he could have downloaded them using the JSTOR account he had through his research fellowship at Harvard. Why go to MIT and use their public JSTOR WiFi access, and then when that was cut off hide a computer in a wiring closet hooked into their ethernet?

I've seen claims that he wanted to do was meta research about scientific publishing as a whole which could explain why he needed to download more than he could download with his normal JSTOR account from Harvard, but again why do that using MIT's public WiFi access? JSTOR has granted more direct access to large amounts of data for such research. Did he talk to them first to try to get access that way?

[1] https://web.archive.org/web/20230107080107/https://www.popeh...

[2] https://volokh.com/2013/01/14/aaron-swartz-charges/

[3] https://volokh.com/2013/01/16/the-criminal-charges-against-a...

He might have wanted other people to have access to the knowledge, and for free. In comparison, AI companies want to sell access to the knowledge they got by scraping copyrighted works.
Wow, just wow.
Truly wow. The sucking up to coroporations is terrifying. This, when Aaron Swartz was institutionally murdered by the institutions and the state for "copyright infringement". And what he did wasn't even for profit, or even a 0.00001 of the scale of the theft that OpenAI and their ilk have done.

So it's totally OK to rip off and steal and lie through your teeth AND do it all for money, if you're a company. But if you're a human being, doing it not for profit but for the betterment of your own fellow humans, you deserve to be imprisoned and systematically murdered and driven to suicide.

Thank you for putting my sentiment into words. THIS. It's not power to the people, it's power to the oligarchs. Once you have enough power and, more importantly, wealth, you're welcomed into the fold with open arms. Just how Spotify build a library of stolen music, as long as wealth was created, there is no problem because wealth is just money taken from the people and given to the ruling class.
CDs, software, and electronic media, yes. Physical books, no. You can't make archival copies.
sure you can, you could take a physical book, and painstakingly copy each page at a time, that is totally fair use.
Leaving aside the broader discussion...

You cannot legally photocopy copy an entire book even if you own a physical copy.

Internet people say you can, but there's no actual legal argument or case law to support that.

> Internet people say you can, but there's no actual legal argument or case law to support that.

Quite the opposite. The burden of proof is on you to show a single person ever, in history, who has been prosecuted for that.

If nobody in the world has ever been prosecuted for this, then that means it is either legal, or it is something else that is so effectively equivalent to "legal" that there is little point in using a different word.

If you want to take the position that, "uhhhhhhh, there is exactly 0% chance of anyone ever getting in trouble or being prosecuted for this, but I still don't think its legal, technically!"

Then I guess go ahead. But for those in the real world, those two things are almost equivalent.

> If you want to take the position that, "uhhhhhhh, there is exactly 0% chance of anyone ever getting in trouble or being prosecuted for this, but I still don't think its legal, technically!"

> Then I guess go ahead.

That is exactly what I am saying.

Gotcha, so then you agree that there is exactly zero cases or evidence of anyone ever being punished for this, which is the most important part.

If you do this, you are not going to be held legally liable for anything.

At home? Without ever sharing it with anyone? I thought making backups of things that you personally own was protected, at least in the US. Could you elaborate on my apparent misunderstanding?
> Could you elaborate on my apparent misunderstanding?

One of the six exclusive rights of copyright holders is "to reproduce the copyrighted work in copies or phonorecords."

(In certain circumstances, the Fair Use doctrine contravenes this right, but reproduction in whole is not such a circumstance.)

I believe the post you are replying to is suggesting the copy is made by hand, one word at a time.
I don't see how that would be different, as the meaningful material is text not images.
Citation needed.
This is a specific exception in Australia Copyright law. It allows reproducing works in books, newspapers and periodical publications in different form for private and domestic use.

(Copyright Act 1968 Part III div. 1, section 43C) https://www.legislation.gov.au/C1968A00063/latest/text

It seems reasonably within the bounds described by fair use, but nobody's ever tested that particular constellation of factors in a lawsuit, so there's no precedent - hand copying a book, that is.

17 U.S.C. § 107 is the fair use carveout.

Interestingly, digitizing and copying a book on your own, for your own private use, has also not been brought to court. Major rights holders seem to not want this particular fair use precedent to be established, which it likely would be, and might then invalidate crucial standing for other cases in which certain interpretations of fair use are preferred.

Digitally copying media you own is fair use. I'll die on that hill. It doesn't grant commercial rights, you can't resell a copy as if it were the original, and so on, and so forth.

There's even a good case to be made that sharing a digitally copied work purchased legally, even to millions of people, 5 years after a book is first sold - for a vast majority of books, after 5 years, they've sold about 99.99% of the copies they're going to sell.

By sharing after the ~5 year mark, you're arguably doing marketing for the book, and if we cultivated a culture of direct donation to authors and content creators, it invalidates any of the reasons piracy is made illegal in the first place.

Right now publishers, studios, and platforms have a stranglehold on content markets, and the law serves them almost exclusively. It is exceedingly rare for the law to be invoked in defending or supporting an author or artist directly. It's very common for groups of wealthy lawyers LARPing as protectors of authors and artists to exploit the law and steal money from regular people.

Exclusively digital content should have a 3 year protected period, while physical works should get 5, whether it's text, audio, image, or video.

Once something is outside the protected period, it should be considered fair game for sharing until 20 years have passed, at which point it should enter public domain.

Copyright law serves two purposes - protecting and incentivizing content creators, and serving the interests of the public. Situations where a bunch of lawyers get rich by suing the pants off of regular people over technicalities is a despicable outcome.

> there's no precedent - hand copying a book, that is

Thank you! I had looked this up myself last week, so I knew this. I had long believed, as GP does, that copying anything you own without distribution is either allowed or fair use. I wanted GP to learn as I did.

For reference, here's the US legal code in question:

Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include— (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

The spirit seems apparent, but in practice it's been used by awful people to destroy lives and exploit rent from artists and authors in damn near tyrannical ways.

Except you said "You can't make archival copies." and didn't provide a citation. That's quite a different claim than "there exists no precedent clearly establishing your right or lack thereof to make archival copies".
I take the contrary view.

What part of fair use pertains to making a physical copy of the complete work?

You can make copies of things. You just can’t distribute them
You're repeating upthread comments. And no, you can't. There's an archival exception for electronic media. If you want to make copies of physical media you either:

1. Can't

Or

2. Rely on fair use to protect you (archival by individuals isn't necessarily fair use)

It absolutely is fair use to copy a book for your personal archives.

The fair use criteria considers whether it is commercial in nature (in this case it is not) and the “ the effect of the use upon the potential market for or value of the copyrighted work” for which a personal copy of a personally owned book is non existent.

https://www.law.cornell.edu/uscode/text/17/107

You would get laughed at by the legal system trying to prosecute an individual owner for copying a book they bought just to keep.

You may copy, but you may not circumvent the copy protection.
Correct. For electronic media.
I'm moving goal-post here since it was not OpenAI (as far as we know): Where Meta training on torrented data fits into this?