|
|
|
|
|
by cameroncairns
1560 days ago
|
|
Really great techniques listed in this thread! I wanted to point out though that it's generally nicer to the website owner if you enable `Accept-Encoding: gzip, deflate`. The difference in the amount of bandwidth charges for the site owner is quite significant, especially should you want to do comprehensive crawls. Yes, go ahead and disable that header when piping curl's output into `less`, however when converting the curl request into python just remember to re-add that header. Pretty much every python library I've used to handle web requests will automatically unzip the response from the server so you don't need to futz about with the zipping/unzipping logic yourself. |
|