Hacker News new | ask | show | jobs
by oezi 617 days ago
I would assume that you would only create a new UUID if the content of the tagged file changed serverside.

Benefits are readability and reduced amount of data to be transferee. UUID is reasonably save to be unique for the ETag use case (I think 64 bits actually would be enough).

2 comments

The point of the content hash is to make it trivial to verify that the content hasn’t changed from when its hash was made. If you just make a uuid that has nothing to do with the file’s contents, you could easily forget to update the UUID when you do change its content, leading to invalid caches (or generate a new UUID even though the content hasn’t changed, leading to wasteful invalidation.)

Having the filename be a simple hash of the content guarantees that you don’t make the mistakes above, and makes it trivial to verify.

For example, if my css files are compiled from a build script, and a caching proxy sits in front of my web server, I can set content-hashed files to infinite lifetime on the caching proxy and not worry about invalidating anything. Even if I clean my build output and rebuild, if the resulting css file is identical, it will get the same hash again, automatically. If I used UUID’s and blew away my output folder and rebuilt, suddenly all files have new UUID’s even though their contents are identical, which is wasteful.

SHA256 has the benefit that you can generate the ETAG deterministically without needing to maintain a database (i.e. content-based hashing). That way you also don’t need to track if the content changes which reduces bugs that might creep in with UUIDs. Also, if typically you only update a subset of all files, then aside from not needing to keep track of assigned UUIDs per file, you can do a partial update. Reasons to do content-based hashing are not invalidated because of a new UUID format.