| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arthur2e5 1790 days ago
	Regarding the XML thing the author did mention later how complex a compliant parser is, so presumably tests have been… done. The “HTML/” before the XML is what really raised my eyebrows with the stuff, since not many people besides browsers implement the optional tag and tag soup recovery right. (At least it’s standardized in the spec now.) Alright, looks like the self-brewed parser is explicitly for CHM HTML4 and EPUB XHTML only, with the more serious stuff in muPDF. I could still mess with it by self-packing CHM (HTML4 has optional tags for ergonomics too), but that sounds very boring.