| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ot 4693 days ago
	Technically speaking, there are no equivalent URLs in general, different strings may lead to different resources. Still, there are a number of common sense heuristics to normalize URLs, that HN applies to do de-duplication. I was wondering what is the rationale for not having trailing slash removal among them. I mean, is there any legitimate website that serves a different resource if you remove the trailing slash?

3 comments

gsnedders 4693 days ago

Per RFC3986/7, http://example.com/%60 and http://example.com/a are equivilant. (Indeed, all major browsers will request the latter regardless of what is input.) Equally, punycode encoded IRIs and the original IRI are equivilance. There is a whole section on equivilance in both of the RFCs (3967 includes 3986 by reference, so is a superset).

link

U2EF1 4693 days ago

Browser equivalence is another thing entirely. Most browsers will accept http://www。google。com (because in Japanese '。' is '.'). But if you tried to request that actual resource it doesn't lead anywhere.

But yeah HN should just use browser equivalence.

link

gsnedders 4692 days ago

But totally undefined and all browsers do their own thing for what's entered in the address bar — there's more consistency in URLs in content, and that doesn't do stuff like normalising '。' but does do the percent-encoded case (for unreserved characters, as the spec says).

Following what the spec says for eqivilance makes sense, at least. Anything drastic is technically treating distinct URLs as equivilant.

link

keeperofdakeys 4693 days ago

Without actually checking redirects, not breaking a few edge cases is much better than a few submissions being duplicated.

link

zeckalpha 4693 days ago

Or it could check for a 3xx HTTP status.

link