Hacker News new | ask | show | jobs
by steveklabnik 3127 days ago
A URL parser takes a string with a URL in it, and returns some sort of data structure that represents the URL.

It's complex because URLs are complex; I believe this is the correct RFC: https://tools.ietf.org/html/rfc3986 It's 60 pages long.

(That said, page length is only a proxy for complexity, of course)

2 comments

And, as a bonus, there's the other URL standard, which describes what browsers actually do:

https://url.spec.whatwg.org/

As someone who once tried to write code to do it to avoid pulling in a dependency.

Never again, it's not just that the spec is 60 pages long but that the actual behaviour out in the real world is miles away from the spec, the web is a complex place where standards are...rarely standard.

When writing code it's a much better idea to write according to https://url.spec.whatwg.org/