Hacker News new | ask | show | jobs
by Someone1234 2471 days ago
Immutable JavaScript/CSS/Blobs/etc.

We have a very typical [web] codebase, server-side code (e.g. business rules, database access, etc), server-side Html generation, and JavaScript/CSS/Images/Fonts/etc stored elsewhere. Two repositories (content and code).

So the obvious question is: How do you manage deployment? Two repositories means two deployments, which means potential timing problems/issues/rollback difficulties.

The solution we use is painfully simple: We define the JavaScript/CSS/etc as immutable (cannot edit, cannot delete) and version it. If you want to bug fix example.js then it becomes example.js 1.0.1, 1.0.2, etc. You then need to re-point to the new version. The old versions will still exist and old/outdated references will continue to function.

This also allows our cache policy to be aggressive. We don't have to worry about browsers or intermediate proxies caching our resources for "too" long. We've never found editing files in-place, regardless of cache policy, to be reliable anyway. Some browsers seemingly ignore it (Chrome!).

We always deploy the "content" repository ahead of the "code" repository. But if we needed to rollback "code," it wouldn't matter because the old versions of "content" was never deleted or altered.

There's never a situation where we'd rollback "content" because you add, you don't edit or delete. If you added a bad version/bug, just up the version number and add the fix (or reference the older version until a fix is in "content," the old version will still be there).

3 comments

A much easier way than this is to append a hash of the file instead of 'versioning' it. Some people add it as a query string, some add it into the filename.

Been doing this for years with infinite (well, practically) cache settings.

These days it's built into most js compression tools afaik.

This is the official way to do that I think -

https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

That doesn't work, because no one file exists in isolation. If you're using version 32.14 of this, you want version 32.14 of that, and this other thing. Versioned directories make this kind of grouping natural and easy, co-mingled hashes do not (and you could do both but you have the downsides of both and no real upsides).

Plus semantic versioning can help cross-team communication, there's no human understanding of raw hashes.

You don't necessary need to use a hash based on randomness. Either using the git commit as the version/hash or a hash based on the content of the file itself works.

So as long as your entrypoint and it's references are versioning, everything follows from that. So if I load version A of index.html, it also points to version A of the scripts/styles. If you load version B, you get version B of the scripts/styles, since everything is versioned the same way.

Git commit is a bad solution, you want to use the file hash so you can have multiple bundles that version automatically. Also, if you pushed a change to even some comments or something not related to code, your bundle will change.

Often you have a bundle of library code that you rarely ever push changes to, and you don't want your clients to download each time you make minor changes.

The file hash of the output file, that's what I meant by hash.

Your way has big downsides and is pretty old-fashioned. It's still used by libraries that only have one javascript file, but not by websites that have to have multiple bundles and multiple CSS files.

Here's webpack's advice about doing exactly what I'm advocating, it definitely works and is the industry standard:

https://webpack.js.org/guides/caching/

Firstly, the file hash mechanism is built in to most bundling tools, and the file hash means you never ever, ever, ever get any collisions or make any mistakes or forget to increase the version number. It's all handled automatically in the build process.

But on top of that, you can also then also have multiple bundles and they will automatically version themselves on the fly. It's common for most sites to have multiple bundles, meaning when you commit a change and rebuild the site, some of those bundles will not have changed. With the automatic hashing, the browser will only download the bundles that changed and you aren't serving tons of unnecessary javascript. You might have one of rarely changing shared libraries, another of the sales part of the website, another for the client part of the website, another for the admin section, you might have a video player that only parts of the site use, etc.

Each release you do would only force the browser to download bundles that actually changed.

For example reddit has:

    https://www.redditstatic.com/_chat.Q8BtxnzGjSI.js
    https://www.redditstatic.com/crossposting.4zJErPF9qdo.js
    https://www.redditstatic.com/reddit-init.en.zJ5ikJ21-Gw.js
    https://www.redditstatic.com/reddit.en.BQfJLVYdPSA.js
    https://www.redditstatic.com/spoiler-text.vsLMfxcst1g.js
Or stackoverflow:

    https://cdn.sstatic.net/Js/full.en.js?v=b45d5b4c957c
    https://cdn.sstatic.net/Js/stub.en.js?v=963cc3083a37
    https://clc.stackoverflow.com/markup.js?omni=Ak4r5CHnPNcIR2AAAAAAAAACAAAAAQAAAAMAAAAAAKpYlh7uXVCYJKM&zc=24%3B4&pf=0&lw=165
Or stackoverflow's CSS:

    https://cdn.sstatic.net/Shared/stacks.css?v=897466c4b64a
    https://cdn.sstatic.net/Sites/stackoverflow/primary.css?v=2d33230dde3d
See how they have a "shared" css they use on all the stackexchange sites, and a "stackoverflow" one, and that they can release each without destroying the cached version of the other?
You're talking about something entirely differently than what I am talking about. We aren't bundling at all. We're minifying and relying on H2 for high performance concurrent delivery. Bundling is the only old-fashioned thing here. Semantic versioning is timeless.

You're talking about a mechanism that is purely designed to cache-bust. I am talking about a mechanism for humans to deploy, understand, and utilize libraries across teams (and to group different files into distinct versions). Apples and oranges. The thread was about architecture, after all...

I won't get drawn too far into your post since it has too many strongly held claims without explanation/justification and I don't feel like trying to unravel that. But, yes, if you're automatically generating bundles for HTTP 1.1, append a hash. We aren't, so we don't.

And you should still be minifying your CSS and JS, even if it's just to get out the comments, and it's still better to use file hashes than piss around with versioning.

Doesn't matter how much you dance around it, this wasn't a good architectural decision, nor is it standard industry practice.

> And you should still be minifying your CSS and JS

We do, as the post you replied to said.

> Doesn't matter how much you dance around it, this wasn't a good architectural decision, nor is it standard industry practice.

Just because you happen to believe something doesn't make it "standard industry practice." Repeating the same unsupported claims with extra conviction doesn't make for an argument (persuasive or otherwise). Semantic versioning and versioning libraries/"grouped by version" is very much standard, in fact the industry's most popular CDN (by far) does exactly that:

https://cdnjs.com/libraries/jquery/

https://cdnjs.com/libraries/angular.js

Your "solution" solves only one issue well: cache busting. Versioned directories solve that issue but also solve other issues (deployment/human understanding/grouping associated resources together).

I'm not sure you yourself even know why you believe this. You just seem to have read WebPack's docs, decided that's how it should work, and view it as a one size fits all solution to completely unrelated problems (i.e. it isn't an architectural/organizational answer, it is a technological one for cache busting, thus irrelevant to the topic).

If you have anything of substance to add, by all means, but so far your post are strong in conviction and weak in justification (technical or organizational). You keep arguing from authority, but forgot to say who the authority is meant to be.

I have recently been struggling with versioning(and learning devops in general) myself so I would love to hear more on this topic. For example if you rollback a deployment (or if you just have browsers who haven't refreshed yet), how do you make sure browser clients are talking to the right api backend version? How do you force them to upgrade or rollback? Will they even be routed to the same api server on multiple calls?

This is especially bad with long-lived single page apps.

(I already use immutable static files auto generated/hashed by create react app. I rely on cloudflare to cache them forever rather than never deleting from the build though)

> how do you make sure browser clients are talking to the right api backend version?

We version the URL itself.

> How do you force them to upgrade or rollback?

We don't use it often but we can embed an "obsolete" tag into the HTTP/AJAX response header which a global AJAX hook (jQuery) will read and bring up a prompt/force a page reload. We use it infrequently but it was added for just such an occasion.

It is a bad user experience but it is a useful tool.

That's a great solution, and I think that's what a lot of webpack build systems do.

In Angular, if a src file changes, then the corresponding build file hash changes. They call it cache-busting as it breaks the cache.

What kind of web stack are you running?

We have Java and .Net Core (trying as a replacement) internet facing and Node.js for internal APIs. All on Linux. Some of this is due to organizational reasons, not technical.

As for the "content" side, it is pretty stereotypical: Sass, TypeScript, AngularJS 1.xx (not a typo!), and too many npm dependencies. But there's too much NIH[0] between teams, which is why our structure is so important in other ways.

[0] https://en.wikipedia.org/wiki/Not_invented_here