Hacker News new | ask | show | jobs
by spohlenz 5884 days ago
> While our crawlers do of course follow links through redirections, the inclusion of modifiable redirects in the stream, and our analysis of the preponderance of spam attempts via these vectors have made it necessary and appropriate in some cases to block the URL shorteners.

Why not do as tkaemming suggests and follow the redirections to link to the final endpoint URL?

1 comments

The intent behind the very statement you quoted was to convey that we do precisely that. However, also mentioned was the fact that in a number of cases, modifiable destination redirects are embedded within the chain. In those cases, unless the redirect is crawled on every clickthrough, the integrity of the chain is difficult to assert.
Maybe you're misinterpreting me, because the linked post suggests that bit.ly isn't actually doing what I am suggesting.

My proposal is this: when a user submits a link to bit.ly to be shortened, bit.ly follows the link through 0..n redirections until it finds the final endpoint URL. This final endpoint URL is then stored as the bit.ly link.

Of course, this assumes that you don't care about the modifiable destination redirects in the chain, which maybe you do. In this case you would only follow redirects which match a whitelist of followable domains (other link shorteners).

Maybe (probably?) there's something I'm missing that makes this infeasible, but it seems like the most logical solution to me.

Isn't that exactly what they do? Some URL shortners that don't allow modifiable destinations (like goo.gl) are whitelisted - others are not.
In much the same way that we don't frame links, or permanently remove flagged links, we would never return a short link that pointed to anything other than the link requested by the API, or by the user via an interface. It's a simple, deterministic API. The downsides of the proposed approach are far more extreme than any potential upside.
While I see what spohlenz is saying, I think what you've described here is the right way to do it. Once you start changing the target from what I submitted, I would personally be suspicious.
But, if the other shortener is returning a 301 'permanent redirect' isn't it fully within the letter and spirit of the http spec to forget it and remember the target.

If the shortener was only returning an 302 then removing from the chain would be suspect, but they are saying 'this link always points here, use it'

I think the point was to remove all of the intermediate redirects and point all click-thrus to the bit.ly link right at the final destination. That way spammer reconfiguring their middle-man URL to point somewhere else will have no effect.