Hacker News new | ask | show | jobs
by matt_daemon 520 days ago
Funnily enough the Cloudflare blog identifies Perplexity engaging in dodgy practices to avoid robots.txt denylists:

> Sadly, we’ve observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent. We’ve monitored this activity over time, and we’re proud to say that our global machine learning model has always recognized this activity as a bot, even when operators lie about their user agent.

Clearly not working too well.

1 comments

Lol, I had to report Facebook using the documented Facebook crawler UA, coming from Facebook ASN as a bot to them because they misclassified it. Don't expect too much from their global machine. I wonder if this case also included people manually reporting it...