Hacker News new | ask | show | jobs
by Tainnor 1137 days ago
HTML is not regular, so it can't be recognised by a "theoretical" regular expression, such as introduced in a theoretical CS class. Modern regex engines however, are more powerful and can recognise non-regular languages too.

Then there's a distinction to be made between recognising a language and parsing it.

This article goes into more detail: https://www.npopov.com/2012/06/15/The-true-power-of-regular-...