Hacker News new | ask | show | jobs
by rmilejczz 955 days ago
I was scraping a Wordpress site a few months ago using Go and I had to spoof my user agent to get results. So it definitely happens
1 comments

But the site remains unnamed. To prove, why not let others test the theory. Tell us the Wordpress site.

IME, Wordpress sites do not require a user agent header. Contrast, for example, with Squarespace sites which do require a user agent header.

IME, if send a user agent header with a particular value, then some sites will block, depending on the value. Whereas if _do not send_ a user agent header, then almost all sites will accept.

The so-called "developers" who publish code and commentary about "scraping" almost always include a user agent header. Usually they use fake values. They try to guess the "correct" values to send.

Im not referring to sending fake values. Im referring to not sending the header at all. No "spoofing" is involved. This works for me, for decades, across thousands of websites.