Hacker News new | ask | show | jobs
by trampi 2552 days ago
While your comment is correct, this would not have prevented the issue stated in the article.

> After determining the source IP of the crawler (one of the admins was experimenting with Nutch as a supplement to the wiki's impoverished search capabilities and had authenticated the crawler using their admin credentials)

The real problem is that a GET request is meant to be side-effect free. A crawler only issuing GET requests should not be able to modify e.g. global settings. Even when using an admin token.

1 comments

> GET request is meant to be side-effect free

Meant, yes, by somebody, but a web service creator can decide otherwise, for some reasons, like simplicity.

If I remember correctly, there was a good story about Viaweb, how they figured that sending requests to follow links can be used as commands - and I wasn't sure they didn't use GET for that... but maybe I'm wrong.

> Meant, yes, by somebody, but a web service creator can decide otherwise, for some reasons, like simplicity.

Simply introducing unexpected side effects, you mean? Using methods in ways they're not meant to be used doesn't create simplicity, it creates complexity. Suddenly there are exceptions to the standards that you need to remember and take into account. This story demonstrates that very well.

Laziness is not the same thing as simplicity.

> Meant, yes, by somebody, but a web service creator can decide otherwise, for some reasons, like simplicity.

As the conclusion of the article shows, if you do that you're breaking the contracts of the web platform. It's a sure way to build a web service that won't play along well with other web services, like a search crawler.

> but a web service creator can decide otherwise

This is like "undefined behavior" in C. Sure, you can make a GET have side effects in your application, but everything else will still assume that HEAD and GET is free of side effects, and might repeat requests, omit requests (using cached data), or even do requests speculatively in advance.

This reminds me of the story of an internet connected garage door opener that uses GET for opening and closing

> https://twitter.com/rombulow/status/990684453734203392?lang=...

TL;DR: Safari figures out the user is visiting this page very frequently so whenever Safari opens, it tries to send a request to fetch the garage door "page" link before he visits anywhere to cache the response. Which immediately opens the garage door.