|
|
|
|
|
by jeff_tyrrill
456 days ago
|
|
I've been using HTTrack for almost two decades to create static archives of a yearly website for an annual event. It doesn't do the job 100% but it's a start. In particular, HTTrack does not support srcset, so only the default (1x) pixel-density images were archived (though I manually edited the archives to inject the high pixel-density images, as well as numerous other necessary fix-ups). The benefit of the tool is fine control over the crawling process as well as which files are included. Included files have their URLs rewritten in the archived HTML (and CSS) to account for querystrings, absolute vs. relative URLs, external paths, etc.; non-included files also have their URLs rewritten to change relative to absolute links; thus, you can browse the static archive, and non-included assets still function if they are online at their original URL, even if the static archive is on local storage or hosted at a different domain than the original site. It was more work each year as the website gradually used script in more places, leading to more and more places I would need to manually touch-up the archive to make it browsable. The website was not itself an SPA, but contained SPAs on certain pages; my goal was to capture the snapshot of the initial HTML paint of these SPAs but not to have them functional beyond that. This was (expectedly) beyond HTTrack's capabilities. At least one other team member wanted to investigate https://github.com/Y2Z/monolith as a potential modern alternative. |
|
However, after the conference was completed, the entire site was downloaded and the HTML files were uploaded statically at the same URLs. This preserved the sites from 2009 till now. You can actually see the old talks and discussions e.g. https://in.pycon.org/2009/, https://in.pycon.org/2010 etc.
I came across httrack around that time but we used wget to mirror the website. I found it interesting. IIRC, it used to refresh itself to copy recursively but I could be wrong. It's been a long time.