| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by likium 1770 days ago

Even if you built a URL validation regex that follows rfc3986[1] and rfc3987[2], you will still get user bug reports because web browsers follow a different standard.

For example, <http://example.com./> , <http:///example.com/> and <https://en.wikipedia.org/wiki/Space (punctuation)> are classified as invalid urls in the blog, but they are accepted in the browser.

As the creator of cURL puts it, there is no URL standard[3].

[1]: https://www.ietf.org/rfc/rfc3986.txt

[2]: https://www.ietf.org/rfc/rfc3987.txt

[3]: https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

3 comments

yyyk 1770 days ago

<http://example.com./> is a valid URL, see for example:

https://jdebp.uk/FGA/web-fully-qualified-domain-name.html

link

MildlySerious 1770 days ago

Tangentially, Youtube had a bug surface last year where adding that extra dot let you avoid all ads. Previous discussion[1]

[1] https://news.ycombinator.com/item?id=23479435

link

userbinator 1769 days ago

This "bug", can definitely also be known as a feature ;-)

link

dhsysusbsjsi 1770 days ago

Also nearly every paywalled media site

link

Sephr 1770 days ago

There might not have been a generally accepted standard then, but there is now: https://url.spec.whatwg.org/

link

jt2190 1770 days ago

There's also a question of what we're really trying to validate, IMHO. All of these regex patterns will tell you that a string looks like a URL, but they won't actually tell you if: There's any web server listening at that particular URL; Whether that server has the resource in that location; If that server is reachable from where you want to fetch it; etc.

link

staticassertion 1770 days ago

> All of these regex patterns will tell you that a string looks like a URL,

yeah that's it that's what they're trying to validate

link

MPSimmons 1769 days ago

It seems like the answer is almost always yes.

link