Hacker News new | ask | show | jobs
by sph 881 days ago
The Internet is a wild place, and I reckon 90% of the complexity of a crawler is dealing with workarounds and non-compliant servers (cough www.apple.com cough).

I'll have a look, thanks for the heads up.

1 comments

are you setting the headers to make the sites think it's a browser??

edit: User-agent: bernard/1.0"

I bet thats going to cause issuses.

Id fake a browser user agent for off domain sites.