Hacker News new | ask | show | jobs
by hncommenter13 4608 days ago
As of version 1.14, wget natively supports warc (including built-in gzip and cdx index file generation).

http://www.archiveteam.org/index.php?title=Wget_with_WARC_ou...

This makes creating a browse-able mirror of a site in warc format fairly straightforward, as wget will automatically make links relative, as well as fetch requisite files (css, js, images) for each page.

1 comments

Yeah, but as far as I can guess, derwiki's service doesn't use wget, so running a proxy to store the WARCs is the next-simplest thing.
If his service runs on any sort of Linux distro, its stupid simple to call wget with a system call. Wget comes standard with all of the most popular distros.