Hacker News new | ask | show | jobs
by pasta 3369 days ago
This looks like a fun project indeed!

Unfortunately every time I read something about minifiers I got the feeling that people are optimizing the wrong problem.

If you gzip data over the line it's already compressed. So minifying your stuff will only help you a little.

The problem is on the client side. You can compress what you like but if the browser starts dropping frames because it has to compile/handle a ton of Javascript and CSS then minifying doesn't help the end user.

10 comments

> If you gzip data over the line it's already compressed. So minifying your stuff will only help you a little.

For small files you might be mostly correct, but for larger ones min+compress can product much better gains than compression alone.

IIRC the algorithm used employs a rolling compression window, and can only match strings of tokens whose distance apart is smaller than that window. IIRC the default window is 8KBytes and the maximum is 32KBytes. Even if you use the maximum at the expense of CPU time that isn't going to cover many large files. Minifying increases the effective range of the compression window, each match is shorter but you will find more matches and usually this balances out in a way that benefits the compression result.

It isn't quite that simple in reality as there is huffman encoding and other tricks in the mix. This means that even for inputs smaller than the compression window you may see some benefit as minifying can reduce the input data's alphabet significantly.

Ignoring the "why it helps", it is easy to show that it does help in a great many real cases:

  ds@s2:/tmp$ wget --quiet https://code.jquery.com/jquery-3.2.0.min.js
  ds@s2:/tmp$ wget --quiet https://code.jquery.com/jquery-3.2.0.js
  ds@s2:/tmp$ gzip jquery-3.2.0.min.js
  ds@s2:/tmp$ gzip jquery-3.2.0.js
  ds@s2:/tmp$ ls -l j*
  -rw-r--r-- 1 ds ds 79201 Mar 16 21:30 jquery-3.2.0.js.gz
  -rw-r--r-- 1 ds ds 30023 Mar 16 21:30 jquery-3.2.0.min.js.gz
In this example the result of min+comp is less than 40% the size of the result from compression alone.

For completeness, minifying alone achieves less than compression alone:

  -rw-r--r-- 1 ds ds 267686 Mar 16 21:30 jquery-3.2.0.js
  -rw-r--r-- 1 ds ds  79201 Mar 16 21:30 jquery-3.2.0.js.gz
  -rw-r--r-- 1 ds ds  86596 Mar 16 21:30 jquery-3.2.0.min.js
  -rw-r--r-- 1 ds ds  30023 Mar 16 21:30 jquery-3.2.0.min.js.gz
One further factor is CPU time consumed on the client decompressing and parsing the content but this is likely to be insignificant compared to the network or local IO time, if a device's CPU is under-powered enough that this is significant then it is unlikely to be able to run the decompressed code with useful performance.
Most of the gains in there are from stripping out comments. That plus whitespace removal gets you most of the benefit. I don't think the parent was advocating for dropping minification completely, but investing massive effort when you're already at the crest of the curve.
Stripping out comments would be one, but eventually remove all useless code and optimizing it using tools like Google Closure Compiler is way more effective in most websites that use a single bundle for everything.
When the subject is CSS, dead code removal is a way more complex problem, and only possible if your usage falls within certain constraints. Best bet is a component system with scopes styles that ensures you are only loading what is required.
It's not like this is a zero sum game.

Attempts at improvement don't hurt at all, and in some cases can help a ton.

But they can hurt sometimes -- not all optimizations are always safe.
Of course, but there is still value in "unsafe optimizations" for those who won't be impacted by them.
Are you sure that's not just comments being removed?
I had an article about that too. If you have to do just one, you should go with zopfli or brotli instead of minifying. Having both minification and some kind of compression on top does help the file sizes.

https://luisant.ca/brotli-css

Also, purifycss and uncss are your friends to cut stuff down, to reduce the final load for the user.

> So minifying your stuff will only help you a little.

True, the difference between 10KB compressed and 7KB minified+compressed is negligible for your visitors, but it still takes 30% off of your traffic bill.

This might be the only valid reason. But only for website that have a huge amount of traffic.
And only the first visit.
Unless you are inlining your CSS.
One massive thing minifying does is dead code elimination (slightly less applicable to CSS but it still applies using some build stacks)

We can build a "prod" version of the app and the minifying process will drop all the debugging code as well as any unused or uncalled functions from the output.

What JS Minifier do you know that does dead code elimination?

I would have thought that understanding what functions of a dynamic language that can be safely removed would require parsing/AST analysis beyond those found in the typical minifier.

I use uglifyjs and it does a pretty good job of it.

We use blocks like this throughout our codebase:

    if (process.env.NODE_ENV === 'development') {
      fancyDebuggingFunction(stuff)
    }
    // ... some code later
    fancyDebuggingFunction(var) {
      /* do debugging things here */
    }
And when combined with some stuff that sets the process.env.NODE_ENV (not sure how that part works honestly, never really looked into it) it will remove not only the if statement, but also the function if it's not called anywhere and not exported.

Throw in Webpack 2's import/export bundling stuff, you can exclude whole modules which you don't use in production which can really reduce the size of your code.

And moving forward if the JS ecosystem can ever get a handle on a good way to move to import/export by default, you'll start seeing massive gains in this area by being able to strip out any parts of a library that you don't use.

Also, I believe most widely used minifiers use AST parsing to do their work.

Closure compiler, while trickier to use than the other options, will do dead code elimination, constant propagation, and other classic whole-program optimizations.
I don't know about the typical minifier, but most of the popular ones (uglifyJS, closure etc.) work by first building an AST and then analyzing that.
The dojo build system (as clunky as it is) has supported dead code elimination at the module level though dependency analysis for a long time.
Rollup (and relatives) even do tree shaking if you use ES2015 Modules. (Those modules have a static-analysis friendly set of requirements.)
Webpack, rollup, closure compiler + uglify. In JS land it's commonly called "tree shaking".
I have seen a script which was 250kb gzipped, but 125k minified+gzipped.

It was a script embedded to the other people's pages (and yes, it delivered substantial functionality, it was not just a tracker), so minification saved a lot of traffic/money for the company.

Minifying also speeds up decompression, because less data has to be produced by the decompressor. Compression and minifying are really different optimizations, as the minification does not need to be reversed. So each one has benefits.
It will still have a small impact:

-) Less work for decompression

-) Less total length means lexing+parsing will be a bit quicker

-) shorter class names will also mean a lower memory consumption because of shorter strings, and ideally fewer allocations if some pooling or smart allocator is used

But those points can probably be completely ignored, since JS is a way way bigger factor.

Does minification speed up parsing (less characters to tokenise)? If so, then minification+compression would be better than compression alone as it would make up a bit for the time spent decompressing.
Gzip compression over https is a vulnerabilty[1].

Depending on the scale, shaving a few kB here and there can amount to significant savings in the long run.

[1]: https://en.wikipedia.org/wiki/BREACH

I am not a security researcher, but I think you could keep the benefits of both compression and security, as long as you're careful on the server side:

Say you have a document structured like [boring data] [secret data] [boring data]. I don't know if any existing compressor lets you do this, but the gzip file format (really the 'deflate' format used inside it) allows you to encode this (schematically) as follows:

[compressed boring data] || [uncompressed secret data] || [compressed boring data]

where each || is i) a chunk boundary (the Huffman compression stage is done per-chunk, so this avoids leaks at that level), and ii) a point where the encoder forgets its history - ie, you simply ban the encoder from referencing across the || symbols.

If you wanted, you could even allow references between different "boring" chunks (since the decoder state never needs resetting), just as long as you make sure not to reference any of the secret data chunks.

Edit to add: Also, if the "boring" parts are static, you can pre-compress just those chunks and splice them together, potentially saving you from having to fully recompress an "almost static" document just because it has some dynamic content.

The other benefit is from combining files and reducing the number of http requests. Minifiers are really needed for that, but the do make for some nicer development workflows.
> The other benefit is from combining files and reducing the number of http requests. Minifiers are really needed for that, but the do make for some nicer development workflows.

debatable with HTTP2 . Furthermore, separate files are easier to cache. If one of them doesn't change it doesn't have to be loaded again. That's my experience with bundles, especially when one uses asynchronous module definition instead of babel, webpack and co.

What about cache expiry? Minfiers can generate a hash and tack that on to the file name so it's cached forever. With http2 can you do this without the back and forward conversation?
Number of HTTP requests is not a concern with HTTP2 server push and multiplexing. In fact it's usually better to have 2 fairly sized files that can be downloaded in parallel rather than 1 large file.
probably not true, since most http/2 implementations that I know of use time multiplexing, which means that only one element at a time can pass, so the time is exactly the same.

I mean if I split a file in 10 exact pieces or if I split two files in the exact same 10 pieces as well I still have the same data.

(Edit: Well basically two files have mostly more data since they both might contain a BOM or so)

There is a nice diagram here showing how requests/responses are able to be sent in parallel (not time-based multiplexing): https://developers.google.com/web/fundamentals/performance/h...

And see Nginx's `http2_max_concurrent_streams` option: https://nginx.org/en/docs/http/ngx_http_v2_module.html#http2...