The regex is surely faster for the specific case. I can't say I've seen an XHTML parser off hand that allows me to stop parsing after just the start tag. Perhaps a lazy parser could start to compete, but I'm just guessing.
Aren't most XML parsers SAX or STaX based? Only time I ran into a library that only offered a full DOM without the underlying event based parser was whatever browsers consider the JavaScript standard library.
You're totally right! Many good stock parsers already stream things (more or less).
Still, I'm just making a comment about the overhead... I would hedge a guess that you're going to have a hard time beating a regex with an HTML parser for speed, assuming what you want can be done with both.
This is all irrelevant, because as the OP mentions, the SO question at hand cannot be solved with standards compliant parsers because self-closing tags will not be distinguishable.