One way to dedupe inline image resources while still using HTML rather than MHTML, could be to encode them in css once, and transform the image element to something with that class.
Good point. I was thinking in the direction of something I'm tinkering with in a similar area. There getting a static snapshot of the current DOM or fragment is key (meaning scripts being stripped out is an intentional feature). Tweaking the document contents for efficiency could significantly impact a lot of script work that may be present.