Hacker News new | ask | show | jobs
by Forge36 2011 days ago
It looks straightforward until you hit a couple of edge cases. Examples:

test <1 becomes test 1

Test< 2 becomes test 2

Test <a becomes test

Test < b becomes test b

(From memory)

What about: Test <fakeTag>?

Per tests i did, "test " was expected however "test <fakeTag>” was seen as the plaintext version suggesting there's a list of valid tags which is filtering the behavior.

2 comments

That's because '<' needs to be followed by [!/?a-zA-Z] to be recognised as a tag start. Otherwise it is a literal '<'.

The full details are in here somewhere: https://www.w3.org/TR/2011/WD-html5-20110113/tokenization.ht...

I have been stuck on such these edge cases for almost 15 years building my own HTML parser

It is always working on all the HTML files I have, but then people make new HTML files with other issues.