Hacker News new | ask | show | jobs
by digikata 3172 days ago
This seems like it's vulnerable to some form of abuse.

library.com/books/1as03jf08e/Moby-Dick/

library.com/books/1as03jf08e/Hitchhikers-Guide-to-the-Galaxy

Now lead to the same place...

2 comments

eh. You can do that with query strings and hashes in URLS anyway. https://news.ycombinator.com/user?id=digikata&profile=bad-pe...
standards wise, you know the part after ? is variable though...
variable? Not sure if I 100% get what you're saying, but what I know is that https://news.ycombinator.com/user?id=digikataWaitNoThisOther... won't go to the same place as your user profile. There's standards, and then there's "Standards".
You would redirect to the canonical one.
I think the concern is in the way it obscures the target. Replace "Moby Dick" with a Chuck Tingle (warning, probably nsfw) book. Now that second link is a serious problem.
I see what you're saying, but it doesn't seem like much more than a funny gag you might pull on a friend.

If a website is concerned about that case, then instead of letting it inform their URL design, they should have a "Warning: Adult content. [Continue] [Back]" interstitial like Reddit or Steam.

I'm not even sure it's a serious problem - a possible annoyance, and perhaps, for a spammy site owner, maybe even a feature. But as a web user, I'm not really fond of that added uncertainty.
You don't necessarily have to redirect, but you should at least include `<link rel="canonical" href="..." />` (as given example StackOverflow does) so that search robots and other website (scrape and/or API) clients know which one is the canonical path, to avoid duplicate efforts.
That only works for some crawlers. Certainly not for users. Meanwhile, everything obeys redirects.

Since you bring up Stack Overflow, notice that they do the canonical redirect. Change the title in the URL and you'll get redirected.

Yes, the best approach is probably both, but it is crawlers that it matters more that they know the canonical paths more than users, and a crawler ignoring rel="canonical" is likely not much better than/as buggy as a crawler ignoring robots.txt; it's a specification they can ignore at their own peril.