Hacker News new | ask | show | jobs
by oefrha 1166 days ago
Slightly related: Netlify has/had an even bigger problem around caching, and not just caching.

I set `cache-control: public,max-age=2592000,immutable` on my SPAs' assets as they're hashed and should be immutable.

But Netlify somehow doesn't atomically swap in a new version: say my index.html referenced /assets/index.12345678.js before and is updated to reference /assets/index.87654321.js instead, there's a split second where Netlify could serve the new index.html referencing /assets/index.87654321.js, while /assets/index.87654321.js returns 404! So users accessing the site in that split second may get a broken site. Worse still, the assets directory is covered by the _headers rule adding the immutable, 30-day cache-control header, and Netlify will even add it to the 404 response... The result is user's page is broken indefinitely until I push out a new build (possibly bricking another set of users) or they clear the cache, which isn't something the average joe should be expected to do.

I ended up having to write a generator program to manually expand /assets/* in _headers to one rule for each file after every build. And users still get a broken page from time to time, but at least they can refresh to fix it. It really sucks.

3 comments

This is a fundamental flaw with the whole "atomic deploy" model. The web is a distributed system and you can't just pretend that it isn't.

https://kevincox.ca/2021/08/24/atomic-deploys/

In this case it is possible that Netlify could have avoided the issue where the new HTML loads the old asset but this will just make the problem where the old HTML gets a 404 because the new asset has been swapped worse.

At the end of the day hashed assets is a great idea but you need to keep multiple versions around.

I wonder if somehow the request got handled by two different edge workers that were desynchronized? I’ve seen it happen in busy areas (NYC, etc.) where a single client will hit many Workers in a session whereas when connecting from a rural area I’ve never observed that.

Regardless, I say the solution is fat index files. Is there any tangible benefit to the long held tradition of separating the structure from the functionality from the styling? Seems to me like that’s just asking for trouble.

I mostly use Vite nowadays, so my bundle is usually automatically split into a vendor.js and an index.js. The vendor bundle for dependencies is large (usually 50-200KB brotli'ed for my popular side projects) and rarely changes. The index.js containing only my code is usually smaller and changes on every build. With a fat index everything has to be downloaded on every change. Most people don't care these days but I try to make the experience nice even for people with really shitty connections.

In addition, fat index files are really bad for multi-page apps.

Splitting vendor and index makes good sense, I use esbuild which has [iffy](https://github.com/evanw/esbuild/issues/207) support for that. Still, the vendor code could be loaded independently while the application code (and styling!) is inlined into the initial index.html response.

> I try to make the experience nice even for people with really shitty connections

I see this sentiment a lot, though people seem to be able to use it to justify any design at all... the question IMO is what does "shitty" mean?

- In a low bandwidth case, serving only the absolute minimum data to do what the user has specifically requested makes good sense. A solution here is Server Components, which can send the client js event handlers on a per-interaction basis.

- In a high latency case, the total number of round trips should be minimized at all costs, so the Server Components approach is terrible, the client may need to wait for 2+ round trips to do their interaction (one to download the client js, one for that client js to perform the actual action). A solution here leans towards the fat index approach.

- In the case where connections drop often, all the data that the client might need should be transferred over as soon as possible, as there's a good chance they won't be able to access the server at the exact moment when the data is needed. The solution here is the fattest indices possible, with copious caching.

These are all three in conflict with each other to some extent, so the best approach is probably dependent on the specifics of your user-base.

In my experience, a "shitty connection" is one where the bandwidth is low, latency isn't typically a big deal, but the connection will drop frequently, potentially for hours or more. However, in such cases I'll at times have access to the occasional hotspot where the network is perfectly fine. Accordingly, I design my apps to transfer as much data as possible initially (ensuring that the main content can be seen and interacted with even if data for some other module is transferring in the background), and provide the option to store everything in stale-while-revalidate service worker caches so the full experience is available fully offline to the extent possible, even if you didn't fully explore the app while online. In this way, I can download the latest chunks when on the good connection and run fully offline from then on (obviously excepting actions that are legitimately impossible without a server).

Is the JS file somehow being embedded into index.html on the server side? If not, how do you expect this to be atomic when the user’s browser is making two separate requests (with an arbitrary delay between them)?
The atomicity is for the site update.

If I'm deploying to my server, the structure would look like:

  /srv/example.com/prod -> /srv/example.com/versions/1
  /srv/example.com/versions/1/index.html
  /srv/example.com/versions/1/assets/index.12345678.js
  /srv/example.com/versions/2/index.html
  /srv/example.com/versions/2/assets/index.87654321.js
A new version is atomically swapped in by changing the prod link from versions/1 to versions/2. If you request index.html and get the updated version, there's no scenario where assets/index.87654321.js could 404. Serving an updated index.html but 404 for a later request for assets/index.87654321.js is not reasonable. Of course distributed systems are harder but it's their problem to solve.

Note that with a naive web server and the layout above, one could get an old index.html but no assets/index.12345678.js by the time the .js file is requested, but that's less problematic and could be covered by some lingering cache. Or I could simply include the last build's assets as there's no conflict potential.

> Or I could simply include the last build's assets as there's no conflict potential.

It looks like your build puts the hashes into the file names for each asset (instead of just naming resources purely as the output from the hashing function). If you're using a halfway decent hash function, you're ~never going to get a hash conflict even across all of your assets, let alone across all the versions of an asset for a single source file name.

You could just leave all the old assets in place (esp because many of them won't change from build to build) and prune hashed assets that you know haven't been referenced from an index.html in >1 month.