Hacker News new | ask | show | jobs
by sutterd 34 days ago
Doh! The part past the # does not go to the sever, so that wasn't a longer URL. How about:

https://chrismorgan.info/%6e%6f-%71%75%65%72%79-%73%74%72%69...

1 comments

Indeed, that's not a query string! The #, and following text, is a fragment, is client-side only, and isn't the subject of the blogpost. Neither is percent encoding, which is just another way to send the exact same path from your browser to the server.

Note that it has nothing to do with the length of the URL. That's just the error message he's chosen to use, because "4xx stop pissing about with my URLs" doesn't exist in the spec.

> percent encoding, which is just another way to send the exact same path

This is not true for all characters. Some can only be expressed by percent-encoding, and decoding them will either break things completely (e.g. %20) or change the meaning of the URL (e.g. %2F, %3F in paths).

Yes, you can encode x as %78 and it should work identically, and you can decode %78 to x and it should work identically—though in both cases, I reckon there’s a strong case for blocking the request as suspicious, and I will probably start doing that soon.

But take these examples of improperly decoding:

• /foo%2Fbar/baz.html has path «"foo/bar", "baz.html"».

• /foo/bar/baz.html has segments «"foo", "bar", "baz.html"».

• /foo%3Fbar/baz?quux has path «"foo?bar", "baz"» and query "quux".

• /foo?bar/baz?quux has path «"foo"» and query "bar/baz?quux".

Indeed, it's essential in some cases. I was talking about in the context of sutterd's suggestion, where just lower-case letters have been encoded.

> strong case for blocking the request as suspicious

Yep, as there shouldn't be any "normal" reason to do such a thing.