| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nixpulvis 1870 days ago
	The regex is surely faster for the specific case. I can't say I've seen an XHTML parser off hand that allows me to stop parsing after just the start tag. Perhaps a lazy parser could start to compete, but I'm just guessing.

2 comments

josefx 1870 days ago

Aren't most XML parsers SAX or STaX based? Only time I ran into a library that only offered a full DOM without the underlying event based parser was whatever browsers consider the JavaScript standard library.

link

nixpulvis 1869 days ago

You're totally right! Many good stock parsers already stream things (more or less).

Still, I'm just making a comment about the overhead... I would hedge a guess that you're going to have a hard time beating a regex with an HTML parser for speed, assuming what you want can be done with both.

This is all irrelevant, because as the OP mentions, the SO question at hand cannot be solved with standards compliant parsers because self-closing tags will not be distinguishable.

link

Akronymus 1869 days ago

I believe you could build such a parser out of parsec. Altough, I am not sure if that is exactly what you are going for.

link