Hacker News new | ask | show | jobs
by windhaven 461 days ago
From the post: “It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled.”

Okay, why should I care if a crawler that is clearly doing something it shouldn’t receives misinformation?

4 comments

That's actually a good strategy. It avoids adding more false information in the infosphere while de-incentivising the crawlers from returning to the site (since they don't find the information they are looking for there).
I could imagine more sophisticated crawlers might be able to detect false information and then avoid those pages, but maybe that's more far fetched than how it comes across in my mind.
I guess if the crawlers can’t actually see the trap, misinformation would be attributed to your website in case model responses expose content attribution tags to end users.
LLMs already make up citations, everyone would assume it was just the model spewing nonsense citations.
Because people will later vote based on that misinformation.