Hacker News new | ask | show | jobs
by ta8908695 2111 days ago
The external assets for a page could be archived separately though, right? I would think that the static G+ assets: JS, CSS, images, etc. could be archived once, and then all the remaining data would be much closer the 120B of real content. Is there a technical reason that's not the case?
1 comments

In theory.

In practice, this would likely involve recreating at least some of the presentation side of numerous changing (some constantly) Web apps. Which is a substantial programming overhead.

WARC is dumb as rocks, from a redundancy standpoint, but also atomically complete, independent (all WARCs are entirely self-contained), and reliable. When dealing with billions of individual websites, these are useful attributes.

It's a matter of trade-offs.