Hacker News new | ask | show | jobs
Static Asset Compilation (autoref.com)
48 points by autoref 5021 days ago
15 comments

Fairly standard stuff. If you're going to have a title with "you're doing it wrong" you should have some unique insight to support your dramatised title.
I felt the same way. I was hoping the article would be about a radically different approach, but it's mostly just best practices.
Agreed - getting sick of these thoroughly underwhelming "done right" and "you're doing it wrong" blog posts. Nuts to 'em.
With such a complicated system I think you are missing out on the most signifcant speed optimization technique; reducing http requests.

For reference: http://developer.yahoo.com/blogs/ydn/posts/2007/04/rule_1_ma...

It's laudable that you are paying attention to caching, but you don't compile all your files in to one file. It seems like you could pick up a lot ground here by at least concating all css into one file, and js in to one file.

Also you could load jquery from the Google AJAX API endpoint. That way the users has a higher chance of having already loaded jQuery.

Also using the same CSS/JS products across multiple pages would help.

An excellent point, but you have to consider warm cache vs cold cache optimizations. For a cold cache, it's better to combine assets and reduce HTTP requests. We do that on our homepage.

For a warm cache, it's better to split assets up so they are cached in finer chunks. If I added jQuery in to every page JS, there would be few HTTP requests but it would pull jQuery every time, making the payload much larger. There's a balance. I'll write another post about warm vs cold cache optimizations soon.

"using the same CSS/JS products across multiple pages would help."

Definitely. Using jQuery on half your site and YUI on the other half is pretty bad from all angles.

"you could load jquery from the Google AJAX API endpoint."

Yeah. Two reasons we don't: 1. I'm in security, and trust no one. 2. HTTPS connection reuse vs negotiation with another host. I have yet another post in the pipe about SSL optimizations.

I usually have three "chunks" of CSS/JS per page, one for CSS/JS that's shared across all pages of the whole site, one that's common among a "group" of pages and one that's unique to just that page.

Of course, not all pages get all three chunks, but I find it's a reasonable tradeoff between reducing the number of requests and not just including everything on every page.

Hmm, i've just left a comment suggesting combining assets. Based on what you've written here you've already answered one of my questions... I look forward to the warm/cold cache post.
You don't need to rename the files. As of a few months ago, you can configure CloudFront to take query strings into account when caching, so you can simply link to the file as normal but append "?<your_hash_here>" to the filename. (I actually prefer using the last modified timestamp over a hash.) IMHO, this is better because it requires less magic on the origin server. And even ancient references to a file (a logo someone hotlinked, for example) will still render rather than 404, so long as the name hasn't changed. No need to keep tons of old revisions of files around.
This isn't recommended since many browsers and proxies do not cache resources that are referenced using a query string, even if a cache-control or expires header is set appropriately. Google says Squid up to 3.0 will not do so:

https://developers.google.com/speed/docs/best-practices/cach...

Fair point and thanks for the link -- I've only seen this phrased as a warning against "some proxies" that I've never personally encountered. But I can live with it. If your proxy is broken in this way, you will have to fetch the asset from the CDN rather than benefit from your local proxy.
Squid 3.0 was released in 2007. It can be argued that this is an out of date recommendation. My experience using query strings has been fine.
To be fair, I think it was still the current version up through 2011.

The more important point is that the failure mode is simply that the assets load from the CDN as if there were no squid proxy. This is not ideal, but it's not so bad either.

Another reason this is not recommended is that, if you rename the files, you can send them to the server and then reload your app.

If you don't, there might be a small amount of time before your app is reloaded where your app uses newer resources.

Could someone say if using HttpGzipStaticModule really helps? Gzipping small static resources on-the-fly should not take down your cpu by much.

Surely a nice thing to have, but does it help?

I found that renaming CSS file names using the hash of the contents does not always work because any changes to dependencies (e.g. images) won't always bubble up to the CSS. I forget all of the reasons why it didn't always work, but it I think it had to do with CDN invalidation for files that I could not-rename (e.g index.html).

The process I use computes the hash of every file and creates a dependency map then I use the hash of the contents of a file and its dependencies to rename the file.

Right. Images and fonts have to be written and hashed first, then used in the template rendering of the CSS file. The CSS references the assets with hashes in the filenames.
Can someone with knowledge of both this and rails 3.1 explain the difference. Seems very similar.
Yes this article is written for people not using Rails & Sprockets. Pretty amazing the best practices that Rails asset pipeline enforces. It will also concatenate the JS & CSS files to reduce HTTP requests and does automatic .gz files on disk. When used with asset_sync gem it can also push these to S3 or your CDN to avoid your web server altogether.
In Python world we have webassets[1] that does something similar (to Jammit, anyway). It is a little bit more complicated to use than Sprockets but I'd argue that it is also a bit more flexible. (Thanks to filters chaining e.g. compile SASS, merge them, add vendor prefixes, optimize, compress then Gzip as a single chain.)

[1] http://elsdoerfer.name/docs/webassets/

You're right, they basically reimplemented Sprockets a.k.a rails assets manager.
Good tips. I've found that http://pngquant.org generates smaller pngs than optipng, but the former is lossy (reduced color palette). I can't tell the difference though.
You can have lossy compression that results in zero difference to the end image on a pixel by pixel basis.

E.g. if the PNG was 32 bit, and had a full color palette but was filled with a single 8 bit color. You could safely, and "loss-ily", convert the PNG to 8 bit and replace the entire color palette with the single entry for the color that is actually used.

That said, PNGQuant uses dithering so there will often be changes apparent if you perform a pixel by pixel comparison in code.

Just like you, I can't visually identify the difference between a PNGQuant image and the 'raw' PNG that was used to create it (at least not on any images that I've seen so far).

To nitpick a little: If your source and final are the same, I don't think you can call it lossy, by definition.
Aside from minifying javascript, you should probably also consider using Google's Closure Compiler in 'Advanced-Compilation' mode. I believe it does a much better job than traditional minification.

https://developers.google.com/closure/compiler/docs/api-tuto...

Is putting hash digests in filenames really easier than sending Last-Modified headers in the response, parsing If-Modified-Since headers and returning 304 when applicable, and/or using ETags?

I would have thought that most web frameworks do all these things for you automatically by now.

Putting the hash in the filename allows the browser to not even make a request that would result in a 304 request. It also works behind badly behaved proxies and caches that don't properly respect cache headers.
It also allows for pages that were generated and cached before changes were made to still have resources, as well as other cases where you might have divergent sets of resources (split tests, rolling deployments, etc.)
We use the git commit hash of the checkin that is pushed to production as part of the folder structure for our assets. Has worked really well for us.

It does mean we use more space on s3 though, but it guarantees we won't miss re-seeding some of the files.

The bad part is no file is cached between pushes, right?
Interesting series http://autoref.com/blog/2012/09/07/the-tech-behind-autoref-p...

Full disclosure (bizdev at AutoRef)

Why is it safe to include a subset of the SHA1 digest instead of the whole digest? What's the reasoning behind this? Would it make sense to use a shorter hash (e.g. CRC32) instead if your filenames have to be that short?
Because SHA1 tends to have every byte of the digest change if so much as one byte of the message changes (if you can disprove that, you have a much more important result than "Oops our caching is slightly borked"). Accordingly, 10 hex digits is sufficient to guarantee that a change breaks the old cache (1 - 1 / 2^40) of the time. You wouldn't be at risk of birthday-paradoxing your caches even with billions of files in your site's history.
You could probably shave a few bytes off your URL's, while achieving the same collision resistance (or alternatively increase the collision resistance in the same number of bytes) if you base64 encoded the hash.
If you're using .net, RequestReduce [1] is an excellent tool for managing your static assets.

[1] - https://github.com/mwrock/RequestReduce

Also check out Cassette http://getcassette.net/
+1 for this link. I can't understand how I haven't come across it before...
Is ImageOptim still a good choice to go with? I really like the simplicity of the GUI.
Yes, ImageOptim is a wrapper around pngoptim and several other programs. It tries them all and goes with the one that provides you with the optimal compression for that specific file. It also supports JPEG and GIF files.
If you're doing it manually, ImageOptim is probably the best GUI available.