|
|
|
|
|
by joshstrange
4164 days ago
|
|
I would love to be able to cache websites to my server and access them in the event the site goes offline. I've tried a number of things like wget to "offline" a website and had mixed success. Does anyone know of a proven way to do something like this? (I'd even settle for no images a la google archive/cache but pulling images and scripts would be a huge win) I'm younger but I can already see link-rot destroying my bookmarks. I now use (and pay) for pinboard.in however I'd like a way to do it myself. I've considered writing a chrome plugin to send url's I visit over to a process running on my server to archive it (with the ability to black/whitelist domains) but haven't found a way to do it yet the works reliably (I'd also probably need to send a copy of my cookies for auth sites). |
|
What about httrack[0]? From description in OpenBSD ports:
HTTrack is an easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.
Or, you can use wget for downloading a single page or recursive download. :)
[0]: http://www.httrack.com/