Hacker News new | ask | show | jobs
by mwint 962 days ago
Genuine question - does it seem plausible that a few GB of content could truly be wiped off the Internet?

I’m not in the piracy scene, but my impression was they routinely pass full res movies around the Internet without much barrier to discovering and downloading them, at least to technically competent users. Is that still true?

4 comments

This is really not about an engineer keeping a bootleg model in your basement. It's about the barrier for entry for commercial products. Or the ability to curate improved open-source implementations in the long haul, for that matter - as past a certain scale, this entails creating a non-profit of some sort to pay your bills.

Plus, while it's definitely the case that with sustained interest, old data tends to linger around... the moment the interest wanes, it's gone. I've been on the internet for a while and there are so many hobby sites, forums, and software projects from the early days that are simply gone for good (and not on archive.org).

The Pile was. It’s still available but no one will touch it, mostly due to books3.

The difference is that a few people with lots of resources take on legal risk. In the piracy example many people with few resources take on risk, which works out since no one wants to sue people with no money.

The Pile is still used to train LLMs and it's still very much available on the net. I agree it's a risk to train your models on the dataset until the legal implications are worked out, but it doesn't seem to be stopping people.
The purpose of regulations like these are not to prevent a thing from happening. They're so that normal behavior is criminalized but not enforced unless you happen to rock the boat some day.
the old models will become deprecated if they are not upgraded, and won't incorporate new information. Even if the files are available they will become abandonware.