Hacker News new | ask | show | jobs
by Piskvorrr 3274 days ago
That's just as much "my own" as The Internet Archive: a website Out There somewhere. Worse, it's much more likely to rot and disappear than archive.org. Now, if I could run this locally...

(Yes, yes, `wget --convert-links`, I know. Not quite as convenient, though.)

3 comments

OP here. The internet archive is great, but it's not so awesome if there's some ephemeral content you need to save right away, like Tweets or social media posts. Being able to trigger an archive immediately let's you save temporary content such as that which is more prone to deletion. I'm going to build a Chrome extension to click and make cloud copy of the page you're on, hopefully that will make it seem more personally controllable.

Do you think being able to download the archive locally would be useful?

This might sound insane, but if you modified this into a browser extension that runs locally (with options for one-off or continuous saving for entire browsing sessions) I would probably download it. Personally, I have well over 100TB of personal hard drive space in my home, and I would love to just download entire portions of my browsing history locally for archival reasons (and to truly defeat link rot).

As it is now, I personally wouldn't use it (but it's a cool project, definitely please keep working on this idea!).

modified this into a browser extension

I was just thinking about this last night while I was explaining my use of the Firefox tab groups extension to a friend. I use bookmarks and tabs to keep track of information. Neither is fully convenient and the whole system fails whenever a page changes or a link rots.

I would love a system that archives a page I bookmark so that the bookmark will always work to give me that information. Give me an 'ephemeral' checkbox if I want my bookmark to change when the site changes. Hmmm.

I made a browser extension [1] that automatically archives bookmarks to archive.is or (currently Chromium only) locally as MHTML files.

[1]: https://github.com/rahiel/archiveror

Cool, Rahiel! Thanks for doing this.

Here is a related idea I proposed a couple years ago to a Knight News Challenge on Libraries: https://web.archive.org/web/20161104175911/https://www.newsc... "Create a browser addon so when people post to the web they can send a copy for storage and hosting by a network of local libraries. ... While the Internet Archive is backing up some of the internet, it is another single point of failure. We propose developing data standards, software applications, coordination protocols. and hardware specifications so every local library in the world can participate in backing up part of the internet. ..."

Sad that the Knight News Foundation has changed their software and so all the old Knight News contributions are no longer available. It's an example of the very thing that contribution was about -- the need for distributed backups. Glad that info is still findable in archive.org -- until perhaps the Knight News Foundation puts up a broad robots.txt and makes it all inaccessible.

Thanks again for creating a great plugin!

What about IPFS for storing cached pages?
For one-off, you can use a bookmarklet:

For example, Wayback Machine:

Save Current URL: javascript:q=(document.location.href);void(open('http://web.archive.org/save/*/'+location.href.replace(/https..., ""),'_self ','resizable,location,menubar,toolbar,scrollbars,status'));

GoBack Current URL: javascript:q=(document.location.href);void(open('http://web.archive.org/web/*/'+location.href.replace(/https?..., ""),'_self ','resizable,location,menubar,toolbar,scrollbars,status'));

Using zotero's webshot feature should do this. i use it this way
Great idea, thanks for the feedback :)
So like another toplevel commenter asked. Why build this or use this instead of archive.is? And there are already multiple extensions available for chrome for it ;)

I agree with GP here, that anything billed as "My own internet archive" should be run on my computer. Not some one elses.

HI agamble,

You can do just that via https://chrome.google.com/webstore/detail/warcreate/kenncghf... http://warcreate.com.

I am a core contributor to this project on github (https://github.com/machawk1/warcreate) and the maintainer/creator of the latest version of WAIL. So I am not biased in anyway ;)

You can trigger the Internet Archive manually as well.
Oh neat didn't realise that, my mistake :)
To be fair, the easiest way to do it (to my knowledge) is by direct URL entry - for example (replace 'hxxps' with 'https' for this - I didn't want crawlers to pick this up):

hxxps://web.archive.org/save/https://tesoro.io/

A local download only increases the redundancy. Tesoro keeps a copy, and the user keeps a local copy that they can also use however they want. A bit like keeping newspaper clippings, that are found decades later by some relative to be then posted on social websites as something interesting.
Good work. For research and citation purpose a permalink is needed outside the source domain. Which can be trusted and stay for decades.
https://github.com/webrecorder/webrecorder can be run using Docker. There's also plenty of Proxys that can save your browsing. See: http://netpreserve.org/projects/live-archiving-http-proxy/
Have you looked at WorldBrain? It is a fork of falcon, but it keeps a cache and let's you perform keyword searches against the cached content.