| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Piskvorrr 3321 days ago
	That's just as much "my own" as The Internet Archive: a website Out There somewhere. Worse, it's much more likely to rot and disappear than archive.org. Now, if I could run this locally... (Yes, yes, `wget --convert-links`, I know. Not quite as convenient, though.)

3 comments

agamble 3321 days ago

OP here. The internet archive is great, but it's not so awesome if there's some ephemeral content you need to save right away, like Tweets or social media posts. Being able to trigger an archive immediately let's you save temporary content such as that which is more prone to deletion. I'm going to build a Chrome extension to click and make cloud copy of the page you're on, hopefully that will make it seem more personally controllable.

Do you think being able to download the archive locally would be useful?

ia_user 3321 days ago

This exists for (& from) the Internet Archive!

Firefox: https://addons.mozilla.org/en-US/firefox/addon/wayback-machi...

Chrome: https://chrome.google.com/webstore/detail/wayback-machine/fp...

Safari: https://safari-extensions.apple.com/details/?id=archive.org....

Android: https://play.google.com/store/apps/details?id=com.archive.wa...

iOS: https://itunes.apple.com/us/app/wayback-machine/id1201888313

dsacco 3321 days ago

This might sound insane, but if you modified this into a browser extension that runs locally (with options for one-off or continuous saving for entire browsing sessions) I would probably download it. Personally, I have well over 100TB of personal hard drive space in my home, and I would love to just download entire portions of my browsing history locally for archival reasons (and to truly defeat link rot).

As it is now, I personally wouldn't use it (but it's a cool project, definitely please keep working on this idea!).

chongli 3321 days ago

modified this into a browser extension

I was just thinking about this last night while I was explaining my use of the Firefox tab groups extension to a friend. I use bookmarks and tabs to keep track of information. Neither is fully convenient and the whole system fails whenever a page changes or a link rots.

I would love a system that archives a page I bookmark so that the bookmark will always work to give me that information. Give me an 'ephemeral' checkbox if I want my bookmark to change when the site changes. Hmmm.

rahiel 3321 days ago

I made a browser extension [1] that automatically archives bookmarks to archive.is or (currently Chromium only) locally as MHTML files.

[1]: https://github.com/rahiel/archiveror

pdfernhout 3321 days ago

Cool, Rahiel! Thanks for doing this.

Here is a related idea I proposed a couple years ago to a Knight News Challenge on Libraries: https://web.archive.org/web/20161104175911/https://www.newsc... "Create a browser addon so when people post to the web they can send a copy for storage and hosting by a network of local libraries. ... While the Internet Archive is backing up some of the internet, it is another single point of failure. We propose developing data standards, software applications, coordination protocols. and hardware specifications so every local library in the world can participate in backing up part of the internet. ..."

Sad that the Knight News Foundation has changed their software and so all the old Knight News contributions are no longer available. It's an example of the very thing that contribution was about -- the need for distributed backups. Glad that info is still findable in archive.org -- until perhaps the Knight News Foundation puts up a broad robots.txt and makes it all inaccessible.

Thanks again for creating a great plugin!

WhiteOwlLion 3321 days ago

What about IPFS for storing cached pages?

WhiteOwlLion 3321 days ago

For one-off, you can use a bookmarklet:

For example, Wayback Machine:

Save Current URL: javascript:q=(document.location.href);void(open('http://web.archive.org/save/*/'+location.href.replace(/https..., ""),'_self ','resizable,location,menubar,toolbar,scrollbars,status'));

GoBack Current URL: javascript:q=(document.location.href);void(open('http://web.archive.org/web/*/'+location.href.replace(/https?..., ""),'_self ','resizable,location,menubar,toolbar,scrollbars,status'));

thallesr 3321 days ago

Using zotero's webshot feature should do this. i use it this way

agamble 3321 days ago

Great idea, thanks for the feedback :)

dschep 3321 days ago

So like another toplevel commenter asked. Why build this or use this instead of archive.is? And there are already multiple extensions available for chrome for it ;)

I agree with GP here, that anything billed as "My own internet archive" should be run on my computer. Not some one elses.

johnaberlin 3320 days ago

HI agamble,

You can do just that via https://chrome.google.com/webstore/detail/warcreate/kenncghf... http://warcreate.com.

I am a core contributor to this project on github (https://github.com/machawk1/warcreate) and the maintainer/creator of the latest version of WAIL. So I am not biased in anyway ;)

detaro 3321 days ago

You can trigger the Internet Archive manually as well.

agamble 3321 days ago

Oh neat didn't realise that, my mistake :)

Sophira 3321 days ago

To be fair, the easiest way to do it (to my knowledge) is by direct URL entry - for example (replace 'hxxps' with 'https' for this - I didn't want crawlers to pick this up):

hxxps://web.archive.org/save/https://tesoro.io/

jtrip 3321 days ago

A local download only increases the redundancy. Tesoro keeps a copy, and the user keeps a local copy that they can also use however they want. A bit like keeping newspaper clippings, that are found decades later by some relative to be then posted on social websites as something interesting.

rathish_g 3321 days ago

Good work. For research and citation purpose a permalink is needed outside the source domain. Which can be trusted and stay for decades.

unicornporn 3321 days ago

https://github.com/webrecorder/webrecorder can be run using Docker. There's also plenty of Proxys that can save your browsing. See: http://netpreserve.org/projects/live-archiving-http-proxy/

WhiteOwlLion 3321 days ago

Have you looked at WorldBrain? It is a fork of falcon, but it keeps a cache and let's you perform keyword searches against the cached content.