| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pwdisswordfish9 1254 days ago

> Parsing HTML isn’t trivial: aside from bad/invalid HTML (think: missing opening/closing tags, quotes, etc), there’s also a lot of content that requires javascript to render in the first place, for example, which means the page needs to be rendered and have access to window and DOM, etc.

Double standard. If you're going to make a fair comparison, then you need to compare like with like; you need to compare the subset of things about e.g. HTML that give you what you can also get with a screenshot. It makes no sense to hold the performance penalty of script execution against browser runtimes when (a) you don't have to execute any scripts to effect anything that gives you parity with a static image, and (b) you can't with static images do anything like what executable scripts enable.

And whether or not parsing HTML is trivial (which is debatable), it's still not strictly greater than the computational resources that are needed for the kind of computer vision and widgetry that lets you e.g. select the text in a screenshot...