|
|
|
|
|
by azalemeth
1722 days ago
|
|
Honest question: there is a famous and very funny stack exchange answer on the topic of parsing html with a regex [1] that states that the problem is in general impossible and if if you find yourself doing this, something has gone wrong and you should re-evaluate your life choices / pray to Cthulu. So, does this apply to URLs? The fact that these regexes are....so huge...makes me think that something is fundamentally wrong. Are URLs describable in a Chomsky Type 3 grammar? Are they sufficiently regular that using a Regex is sensible? What do the actual browsers do? [1] https://stackoverflow.com/questions/1732348/regex-match-open... |
|
URLs are not recursive structures, so I’d say the single hardest feature of html is not present.