Hacker News new | ask | show | jobs
by flatroze 2498 days ago
Thank you! It's pretty straight-forward: this program just retrieves assets and converts them into data-URLs (data:...), then replaces the original href/src attribute value, so in case with the same image being linked multiple times, monolith will for sure bloat the output with the same base64 data, correct. I haven't looked into MHTMTL, ashamed to admit it's the first time I'm hearing about that format. I need to do some research, maybe I could improve monolith to overcome issues related to file size, thank you for the tip!

And about Rust: I think you're way ahead of me here as well, this is my first Rust program. If you're talking about it embedding some debug info into the binary which may include things like /home/dataflow then perhaps there's a compiler option for cargo or a way to strip the binary after it's compiled. ¯\_(ツ)_/¯ Sorry, that's the best I can tell at the moment.

2 comments

Okay thanks! That was a pretty quick reply :) Regarding MHTML, it's basically the MIME format emails are in (which are basically inherently self-contained HTML documents). Various browsers have had varying degrees of support for it over the years. Chrome recently made it harder to save in MHTML format; I don't know how long they will be able to read the format, so I can't guarantee that if you go in that direction it'll still be useful for a long time, but at the moment there is still some support for it.
One way to dedupe inline image resources while still using HTML rather than MHTML, could be to encode them in css once, and transform the image element to something with that class.
That'd easily break Javascript though.
Good point. I was thinking in the direction of something I'm tinkering with in a similar area. There getting a static snapshot of the current DOM or fragment is key (meaning scripts being stripped out is an intentional feature). Tweaking the document contents for efficiency could significantly impact a lot of script work that may be present.