Hacker News new | ask | show | jobs
Implementing Google Safe Browsing server-side to sanitize untrusted input (codeascraft.etsy.com)
46 points by kellanem 5217 days ago
2 comments

Interesting idea. There are some extra challenges when the scanning is not done live.

Does Etsy accept url shorteners? Including some that allow editing? Is it feasible to rewrite the content to have the redirected URLs?

We "unroll" redirects as much as possible so that we can check the URL at the end.
In the sense of following them, or rewriting them as well?
We follow them on the server side and check the final URL we get to against the GSB database. We don't modify the user-generated content.
I assume you check each step of the unrolling? Otherwise a malicious site could easily do:

   if (is_etsy_ip()) header('Location: http://www.google.com/') && die();
Well, generally following the redirects is actually somewhat redundant. The idea of GSB is that URLs that lead to bad things would all be identified and added to the database.

Customising attacks for a given site specifically adds complexity and cost to the attack, which is really the aim for all of this sort of work. Everything you can do to drive up the cost of the attack makes you a less inviting target.

It would be a mistake to think that usb4ugc (or tools like it) would protect everyone all the time. It's never a replacement for vigilance and education on the user-side, just a useful extra line of defense.

Privacy implications?
The privacy risk for GSB in general is that you are sharing URLs with a (trusted) third party. That's most acute with the REST API, but most implementations (including gsb4ugc) cache a local copy of the lookup tables and so don't actually send URLs to Google. There is still a very occasional need to send a link to Google for validation, but in the server-side case the only context Google has for the request is the IP address of the server, which minimizes privacy risks as much as possible.
With GSBv2, no URL is sent to Google. They recently introduced the Lookup API where users need to send the URL, but this is not what Etsy is using.