| HN Mirror

Oh, for sure, they get all pearl clutchy when others try to do exactly what they have done, and they get all "not like that!" about it. The US is a society run by lawyers, and the big corps have the best lawyers. Maybe we can legislate out of the hole at some point, but it's a pretty grim outlook. Google et al also don't have to have the law on their side, they can simply litigate people and businesses into bankruptcy, regardless of the legal merit of their actions.

At any rate - there are ways of staking legitimate claim to content you publish online. Even by doing so, it may not be relevant. Robots.txt is a convention, not a regulation or law. It's respected out of social nicety, not because it's strictly legally required.

If you publish your data to a website where it's publicly visible, you are inviting the world to come download your data. When that data leaves your server and goes to live on the downloader's computer, the downloader can do whatever they want with that data.

It's not clear that it's legally possible to prevent the use of data in training models unless you require someone to sign a contract to that effect before being allowed to download your data.

That would be obnoxious, and I wouldn't bother with your content anymore. Like Instagram, LinkedIn, and Twitter, your site would get a 127.0.0.0 hosts file entry.

The US needs a clear, modern update to copyright law that upholds and maximizes individual rights, as well as privacy and property concerns. We shouldn't be playing this game where we pretend a website is somehow an analogy for a page of text scribed with a quill pen and using laws developed to handle issues when quill and parchment were relevant.

Let's write some new laws where we regulate what things are, and not play tortuous mental gymnastics to contort and butcher existing laws and precedents to say whatever the most expensive lawyers want.

Maybe the social contract allows for people to prevent their conversations from being scraped and used by third parties without explicit consent, even if the conversation is entirely public. I don't like that view, but I see the argument for it.

As things stand, though, fair use and public access make things pretty bright and clear, and rulings in various AI cases so far have favored broad fair use interpretations, and are requiring complainants to show specific, particular harms. If/When those harms are shown, then we'll see if any carveouts will be made, or if broad fair use interpretations will be the baseline for content scraping going forward.