| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by progbits 297 days ago
	CF is trying to double dip: they are charging users for their CDN, and now they try to also charge for the privilege of accessing their user's content. While I love to see openai get scammed I don't think it will stop there. How cheap and useful do you think Kagi or other search engines can stay with this racket? How will Internet Archive operate?

5 comments

adriand 297 days ago

How is this a racket? This is a service website owners want, and it (that is, Cloudflare’s resurrection of the 402 Payment Required response) seems to be one of the few schemes that can work at scale. The current situation, where AI companies benefit from content created under the premise of advertising revenue, is not just unethical, it’s uneconomical to the point of driving content creators out of business.

link

jychang 297 days ago

Yes, I agree here.

Everyone should remember, limitations of technology is not meant to define society. Instead, we build edge cases into technology to better match society’s general expectations.

A website owner saying “yes normal humans, no bad bots, EXCEPT good bots” is totally fine.

link

stevenicr 296 days ago

Didn't they turn this on by default?

If websites owners truly wanted it, it would be a 'do thing to opt in' and everyone would rush to that.

Now I do think this kind of thing is good for many reasons, but I also see many reasons this can be problematic (that I did not consider the first time I read about it).

I myself would prefer an option to throttle the bots, and give them a 'you can spider at 2am-5am once per month access' via robots.txt, header or something..

you come more than twice in a month and get blocked or pay for access to static version hosted on other server / cdn..

best of both worlds without some of the negative issues.

Otherwise it's a play that helps cloudflare more than anyone else, and hurts more than [open][other][AI] - etc. imho.

link

lxgr 297 days ago

> How will Internet Archive operate?

Presumably increasingly less and less effectively, at least if they continue honoring robots.txt and don't implement scraping protection bypass mechanisms.

https://www.theverge.com/news/757538/reddit-internet-archive...

link

walski 297 days ago

IA has not honored robots.txt for the better part of a decade now.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

link

lxgr 297 days ago

Are you sure? The article (from 2017) you've linked only mentions "U.S. government and military web sites", and their wayback machine FAQ still mentions that robots.txt "might" prevent crawling:

https://help.archive.org/help/using-the-wayback-machine/

link

overfeed 297 days ago

Interestingly, the article declares that Cloudflare is uncertain if the Internet Archive respects robots.txt

link

rsync 297 days ago

"CF is trying to double dip: they are charging users for their CDN, and now they try to also charge for the privilege of accessing their user's content."

Don't forget that cloudflare provides service to the very botnets and flooders/booters they purport to protect against.

Would that be triple-dipping ? Or do we have a special term for this specific behavior ?

link

tonyhart7 297 days ago

"Don't forget that cloudflare provides service to the very botnets and flooders/booters they purport to protect against."

and where is the evidence???

link

m3047 296 days ago

Cloudflare (it was news to me! why are CF assets actively reaching out to my infrastructure since I'm not a customer?) provides anonymization infrastructure to alleged VPN users. A data point. Doesn't mean they don't make an effort to screen abuse, but it's an open question (based on traffic to my site) how good that is. I'm also not convinced I should believe they don't use that traffic for their own purposes because "Simon says so".

link

janderson215 297 days ago

Yes, it’s called tripping.

link

theptip 296 days ago

Doesn’t actually seem like double-dipping.

Users are paying for a service that was costed 5-10 years ago based on human web traffic.

Now AI crawlers are a new source of huge traffic volume and CF is figuring out how to cover costs or profit from that load.

Markets change and so should cost structures.

link

toomuchtodo 297 days ago

The Internet Archive will potentially receive an exemption if they embargo content crawled and dark it (stored but not publicly available) until an agreed upon future date.

link