| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TrueDuality 220 days ago
	This is a false equivalency I'm surprised no one else has brought up. An archive of a site preserves attribution inherently, the scraping and training are not.

2 comments

kulahan 220 days ago

Is it? I thought it was ridiculous at first, but the more I think of it... both are scenarios where a corporation is scraping billions of webpages. We like the reason archive.is does it, but unless it's some kind of charity, I think it's a reasonable comparison.

link

didibus 220 days ago

archive.is is a charity no? Or at least they take donations, it seems the legal entity behind it is nebulous, but they don't have ads and have no paid product or offering.

link

hoistbypetard 220 days ago

They sure as shit do have ads. Have you ever accidentally followed a link using a browser profile that has no ad blocking enabled?

I only rarely browse without some form of content blocking (usually privacy-focused... that takes care of enough ads for me, most of the time). I keep a browser profile that's got no customizations at all, though, for verifying that bugs I see/want to report are not related to one of my extensions.

Every once in a while, I'll accidentally open a link to a news site (or to an archive of such a site) in that vanilla profile. I'm shocked at how many ads you see if you don't take some counter measures.

I just confirmed in that profile: archive.is definitely puts ads around the sites they've archived.

link

didibus 219 days ago

I stand corrected, maybe it's because I have ad-blocks that I never noticed.

And arguably I used to think it was the Internet Archive.

It does make this case seem problematic now that I know the details.

link

warkdarrior 220 days ago

So if OpenAI or <AI scraper of the day> adds attribution to their AI-generated answers, everything is OK?

link

cestith 220 days ago

It would be closer to okay.

link