| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by windhaven 461 days ago
	From the post: “It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled.” Okay, why should I care if a crawler that is clearly doing something it shouldn’t receives misinformation?

4 comments

mentalgear 458 days ago

That's actually a good strategy. It avoids adding more false information in the infosphere while de-incentivising the crawlers from returning to the site (since they don't find the information they are looking for there).

link

exeldapp 458 days ago

I could imagine more sophisticated crawlers might be able to detect false information and then avoid those pages, but maybe that's more far fetched than how it comes across in my mind.

link

bithavoc 461 days ago

I guess if the crawlers can’t actually see the trap, misinformation would be attributed to your website in case model responses expose content attribution tags to end users.

link

dns_snek 458 days ago

LLMs already make up citations, everyone would assume it was just the model spewing nonsense citations.

link

__MatrixMan__ 458 days ago

Because people will later vote based on that misinformation.

link