Hacker News new | ask | show | jobs
by seydor 964 days ago
Let's not be so hasty. I m glad to see that MistralAI is signing this letter. their small model is truly amazing, the most useful open source model i ve tried so far. But this can change tomorrow, they may be deemed illegal and disappear from public view, it has happened before. So i'm downloading and saving whatever model i think is great.
1 comments

Genuine question - does it seem plausible that a few GB of content could truly be wiped off the Internet?

I’m not in the piracy scene, but my impression was they routinely pass full res movies around the Internet without much barrier to discovering and downloading them, at least to technically competent users. Is that still true?

This is really not about an engineer keeping a bootleg model in your basement. It's about the barrier for entry for commercial products. Or the ability to curate improved open-source implementations in the long haul, for that matter - as past a certain scale, this entails creating a non-profit of some sort to pay your bills.

Plus, while it's definitely the case that with sustained interest, old data tends to linger around... the moment the interest wanes, it's gone. I've been on the internet for a while and there are so many hobby sites, forums, and software projects from the early days that are simply gone for good (and not on archive.org).

The Pile was. It’s still available but no one will touch it, mostly due to books3.

The difference is that a few people with lots of resources take on legal risk. In the piracy example many people with few resources take on risk, which works out since no one wants to sue people with no money.

The Pile is still used to train LLMs and it's still very much available on the net. I agree it's a risk to train your models on the dataset until the legal implications are worked out, but it doesn't seem to be stopping people.
The purpose of regulations like these are not to prevent a thing from happening. They're so that normal behavior is criminalized but not enforced unless you happen to rock the boat some day.
the old models will become deprecated if they are not upgraded, and won't incorporate new information. Even if the files are available they will become abandonware.