Good idea, but only if the article can't be edited during that week. What's worth preserving is the version the audience actually read. Articles routinely get ninja-edited after publication, sometimes repeatedly. Changelogs should be mandatory but they're useless if we can't keep them honest.
The reason they're blocking archives is people can go to the archive, to bypass paywalls and avoid targeted adverts, instead of the news site. It's also to prevent AI scrapers harvesting articles.
I meant that news sites should provide an API for Internet Archive to scrape their articles at all times to catch changes, but not provide any public access for an indefinite period of time (as an escrow) but eventually release it once the AI scraping issues blows over.
I dont know if this is still the case but if I told IA via robots.txt not to archive my site, it would still crawl it, archive it but not display it until I shut the site down. Once robots.txt was no longer reachable they would display the archived content. The only way to stop that was to start the site back up making robots.txt reachable and wait for them to crawl it again.
It's not about the paywall in this case. It's to prevent AI companies from scraping a publication's archives for training data. If AI companies want that data, they can compensate publishers, not extract it for free from the Internet Archive.
Yes, it's probably cheaper to just download the newspaper articles from Internet Archive than to buy them directly from newspapers. Training costs minimization, or should we call it stealing?
Do these major publications charge per article? They should, but they don't. So their whole sell is that in aggregate (so access to all, including old articles) they are worth paying monthly for.
How would archive not be a revenue drain if there was pay per read articles? I would think the incentive to try to find a free version would increase not decrease, especially for a wide class of articles that are basically, “I’m curious but not that curious” which in aggregate I might pay money for (they add value to my subscription) but individually feel wasteful (do I really want to pay to satisfy this curiosity?)
The article is about AI companies using the Internet Archive to source training data, not about people using it to avoid paywalls. AI companies don't care that the data is one week old.
You people need to stop saying this. You're being greedy when you buy groceries from a cheaper supermarket. You're being greedy when you negotiate your salary or choose a job based on pay, or anything where you're trying to get more stuff for yourself. Those things are all perfectly good behaviors, they make the world more productive, so everyone wins overall. Greed isn't a problem.
Spite? No evidence of that. They probably just don't want to lose the money from paying customers and ads. You're just making up fantasy. Perhaps projecting your own spite.
1. buying the cheapest groceries you can reasonably find
2. trying to get the highest salary you can
3. literally any time you try to get more for yourself
that's a weak list from which to conclude that greed isn't a problem, especially since in the case of 1. and 2. someone's making money off you, the person who's supposedly greedy in these scenarios.
2. is literally you making money off someone else. That someone else might not also be making money off your work - you might be selling services to an individual for their personal consumption, or more commonly, you might be doing that through an intermediary (employer) that connects consumers with producers and launders the guilt of demanding more money.
1. Do farmers count as greedily making money off you for trying to get the highest prices for their produce from distributers and retailers who are trying to compete for customers with low prices? Yes but that's good! Every player in that supply chain is optimizing for themselves and it ends up working pretty well for everyone. Maybe you think farmers are too rich and should not demand so much money for their produce because greed is bad?
Consensual trade between rational actors leads to both of them benefiting. Before capitalism, the ideology which focused on property rights and individual freedoms didn't really have a name...
It's clear that people place some non-zero value on archival content. It should be unsurprising that news outlets also place some non-zero value on it. Given that they place some non-zero value on it, it is unsurprising that they do not give it away for zero. Disagreeing with their estimation of the value is understandable, but surely it's easy to see why most news outlets do what they do.