Hacker News new | ask | show | jobs
by blibble 3348 days ago
I wonder how many hundreds of kilobytes that adds up to in a 20gb windows install
4 comments

Haven't looked at the code that generates this list (if available), but that sure looks to me like double-counting going on here. Most files in \Windows\WinSXS are hardlinks.

Example pulled at random:

    D:\analysis\Windows\WinSxS\amd64_microsoft-windows
    -imageres_31bf3856ad364e35_10.0.15063.0_none_edd17c6c30b4bf9f\imageres.dll

     ...

     - Total: 4435


    D:\analysis\Windows\System32\imageres.dll

     ...

     - Total: 4435
I am willing to bet those are the same file hardlinked and only wastes the 4435 bytes once, verifiable thusly:

    cmd> fsutil hardlink list \Windows\System32\imageres.dll
    Windows\WinSxS\amd64_microsoft-windows-imageres_31bf3856ad364e35_6.3.9600.16384_
    none_cd7c033dcbdd0cab\imageres.dll
    Windows\System32\imageres.dll
As I suspected, that doesn't look hardlink aware.

A way to correct for this would be to open the files and de-dupe by (((ULONGLONG)nFileIndexHigh) << 32) | nFileIndexLow in this structure: https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

Edit:

> Seems to check though https://github.com/riverar/eoraptor/blob/master/FileEnumerat...

No it does not, reparse points are used for symbolic links and junctions - not hardlinks.

What's the story on that footprint, anyway? We went from (from memory) ~250-350MB Win98 to ~700-800MB XP to ~10-15GB(!?!?!) Win7, and just up from there. Plus the default settings seemed to starting going really crazy with swapspace/caching around the time of Win7. Another 10+GB if you didn't tell it to knock that crap off. Why the sudden, giant shift? They didn't add 10-15x the features, that's for sure.
WinSxS (Windows Side By Side) assemblies were introduced to avoid dll hell by allowing Windows to store multiple versions of installed dll's. So even a minor security patch may leave the former version around because other apps may use/expect it. I think that might add some bloat over time? Also Windows Update installer caches. A ton of Windows updates actually leave their installers around in case you want to uninstall them. That can add up! I've seen it easily get to 1-2 GB.
They did, to some extent. Plug in almost any piece of standard consumer hardware and it'll probably mostly just work without a network connection. All those drivers don't take up zero space, but the benefit when my mom plugs in a printer and it just works makes it worth it.
I think at least some of it is the Windows on Windows stuff to allow 64 bit machines to run both 32 and 64 bit software. Weren't the 32 but versions of Win7 about half the size of the 64 bit ones?

There's still a lot of size growth over time, of course.

You don't keep a 20gb Windows install in RAM all the time though. Bloat in explorer.exe is the issue here
To this point, the files in the install are compressed, and I'm sure XML metadata is the sort of thing that compresses well with the DEFLATE algorithm they likely use.
I discovered this a few months ago, when I went looking for XMP metadata in the filesystem and used the magic number trick to extract it from files of all kinds.

I found it is common to find XMP inside media files embedded inside Windows EXE, as well as Linux binaries, JAR, Microsoft Word and other composite formats.

Complex media objects frequently use an encapsulation system such as ZIP. When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

XMP is very good from the viewpoint of content creators in terms of having comprehensive metadata incorporated into files so that it does not get out of sync. XMP data is RDF data using an improved version of Dublin Core, IPCC and other industry RDF vocabulary. You can write SPARQL queries right away, plus XMP specifies a way to make an XMP packet based on pre-existing metadata in common industry schemes.

The XMP packets can get big, and you sometimes see people make a tiny GIF image (say a transparent pixel GIF) that is bulked up 100x because of bulky metadata. Once you package data for delivery to consumers you want to strip all that stuff out.

The XMP spec is here:

http://www.adobe.com/devnet/xmp.html

There is some brilliant thinking in there, but also things that will make your head explode such as the method for embedding an XMP packet into a GIF

Hmm... would be interesting if we started taking XMP into account when designing compression programs then...
You could actually take any ancillary chunks into consideration, ie. chunks starting with a lower-case first letter. These are non-critical/mandatory.
> When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

PNG can apply DEFLATE to blocks though, right? Does XMP not use it?

Deflating can be applied to some chunks, but not at will. The zTXt chunk can be compressed while for example the tEXt chunk cannot. The newer iTXt chunk can vary.

The two former are limited in scope and language encoding support, so iTXt is typically used for extended textual data such as XML/XMP etc. But if is saved compressed or not depends on the PNG encoder/host used (there can also be multiple instances of these chunks in the same file).

Photoshop for instance saves uncompressed, I guess to give fast access for performance reasons (ie. file viewers using galleries for numerous images while displaying their meta-data).

data contained in an exe (or dll) is not necessarily in RAM at all time
One could fit about 10 Linux live distros in those 20Gb. Those "few hundreds of kilobytes" are indeed trivial.
Windows comes with a ton of builtin drivers. So it will work with a lot of devices out of the box without needing an Internet connection to update.
Same is true in most Linux distros. My stock Debian kernel comes with 2300+ drivers, and which only take ~130MB.
https://blogs.msdn.microsoft.com/e7/2008/11/19/disk-space/

Bear in mind, that was the Vista days, and Windows 10 now supports even more devices. 800MB of drivers at the time. I would not be surprised if Windows supported by default upwards of 10000 drivers. It works pretty much flawlessly on even somewhat obscure and old hardware. And when your OS is installed on that many consumer devices, and not informally standardized servers, you are going to meet those weird devices one way or the other.

Windows drivers may also take up a bit more space individually because of the overhead caused by either the Windows Driver Model or Windows Driver Framework, but that's the price to pay to not have a driver crashing and bringing down your entire system. Yes, Linux, I'm looking at you.

None of them are ever for any of my hardware
I have the opposite experience myself.
What hardware are you using that the defaults don't work?
I've installed the latest Ubuntu in 2016 to find my mouse, wifi, and printer did not work. And my Cinema Display was stuck at 1024x768.
The second thing you want to make work is probably the network anyway, so I don't really see where's the quality here.