If the AI crawlers circumvent the protection mechanisms it's a serious crime now rather than just "Well it was on the open internet for free". Wouldn't surprise me if the the news orgs are also looking at honeypot articles to see if the fake details slip in to LLMs.
It's not a serious crime, or any crime at all, to ignore robots.txt. It's entirely voluntary whether you want to follow it or not. If you don't, you're being a dick maybe, but that's not a crime.
It's not just robots.txt, if you've tried using a VPN lately, so many sites like reddit/youtube/etc block you from viewing content until you log in. Every major website is getting anti scraping tech in the last year. Even archive.org is getting blocked from more and more sites since it can be used for indirect scraping of sites.
It's hardly even a polite ask. It's literally a text file. Automated http clients, such as search engine indexers (Google, yahoo, etc) are expected to use it to know what pages can be visited or not. That expectation is nothing more than a convention.
If you are on a Mac or Linux computer, odds are it has a program called curl pre-installed. If you type in curl website address in a terminal, it'll fetch make a request and download the response. Robot.txt never gets involved. Same is true for AI agents and search engines that aren't polite.
Regardless - requiring an account to read anything, even a "free" one, totally changes whole situation. Even when sites terms of service are limited by local law.
Great Firewall actually. Robots.txt depended on the integrity of the companies crawling. I think they have demonstrated how much integrity they actually have:)
plus it's a pretty dangerous game for them to play against large, powerful actors with legions of lawyers