Hacker News new | ask | show | jobs
by moolcool 3345 days ago
You don't keep a 20gb Windows install in RAM all the time though. Bloat in explorer.exe is the issue here
2 comments

To this point, the files in the install are compressed, and I'm sure XML metadata is the sort of thing that compresses well with the DEFLATE algorithm they likely use.
I discovered this a few months ago, when I went looking for XMP metadata in the filesystem and used the magic number trick to extract it from files of all kinds.

I found it is common to find XMP inside media files embedded inside Windows EXE, as well as Linux binaries, JAR, Microsoft Word and other composite formats.

Complex media objects frequently use an encapsulation system such as ZIP. When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

XMP is very good from the viewpoint of content creators in terms of having comprehensive metadata incorporated into files so that it does not get out of sync. XMP data is RDF data using an improved version of Dublin Core, IPCC and other industry RDF vocabulary. You can write SPARQL queries right away, plus XMP specifies a way to make an XMP packet based on pre-existing metadata in common industry schemes.

The XMP packets can get big, and you sometimes see people make a tiny GIF image (say a transparent pixel GIF) that is bulked up 100x because of bulky metadata. Once you package data for delivery to consumers you want to strip all that stuff out.

The XMP spec is here:

http://www.adobe.com/devnet/xmp.html

There is some brilliant thinking in there, but also things that will make your head explode such as the method for embedding an XMP packet into a GIF

Hmm... would be interesting if we started taking XMP into account when designing compression programs then...
You could actually take any ancillary chunks into consideration, ie. chunks starting with a lower-case first letter. These are non-critical/mandatory.
> When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

PNG can apply DEFLATE to blocks though, right? Does XMP not use it?

Deflating can be applied to some chunks, but not at will. The zTXt chunk can be compressed while for example the tEXt chunk cannot. The newer iTXt chunk can vary.

The two former are limited in scope and language encoding support, so iTXt is typically used for extended textual data such as XML/XMP etc. But if is saved compressed or not depends on the PNG encoder/host used (there can also be multiple instances of these chunks in the same file).

Photoshop for instance saves uncompressed, I guess to give fast access for performance reasons (ie. file viewers using galleries for numerous images while displaying their meta-data).

data contained in an exe (or dll) is not necessarily in RAM at all time