Hacker News new | ask | show | jobs
by qeorge 4166 days ago
Startup idea, free for the taking: create a service that "ossifies" dynamic websites into static HTML.

(By ossify, I mean to take something dynamic and make it static).

For example, that WordPress site you commissioned for a movie 3 years ago? Its a huge liability, but you don't have to take it offline - just ossify it. No one is updating that blog anymore!

Under the hood, it would basically be a crawler, and the deliverable would be a zip file containing a 1-to-1, static copy of their website with all URLs still working. I suspect most folks here could whip up a shitty proof of concept in 48 hours.

If someone does this, email me! I have a couple of potential clients for you (I'm a former consultant, with lots of WordPress sites in my history).

3 comments

You can get pretty close to this, I think with:

    wget --mirror --convert-links http://site.example.com/
From the wget manual:

    --convert-links
           After the download is complete, convert the links in the document
           to make them suitable for local viewing.  This affects not only the
           visible hyperlinks, but any part of the document that links to
           external content, such as embedded images, links to style sheets,
           hyperlinks to non-HTML content, etc.

           The links to files that have been downloaded by Wget will be
           changed to refer to the file they point to as a relative link.

           Example: if the downloaded file /foo/doc.html links to
           /bar/img.gif, also downloaded, then the link in doc.html will
           be modified to point to ../bar/img.gif.  This kind of
           transformation works reliably for arbitrary combinations of
           directories.
This is already a product that exists many times over. Besides the aforementioned wget, I've recommended less technical users to SiteSucker, a Mac/iOS app.

I'd be happy to bill your clients to do it for them though!

Here's the thing: they don't want to run SiteSucker. That's only slightly more helpful than telling them to just run wget!

They want to write a check and get back to business, not become a web developer!

(I think that tool is awesome though, and I appreciate the tip!)

That's basically what the wayback machine does: https://archive.org/web/
Yes! It would be like the Wayback machine as a service, but with some key differences:

1) The intention is that you replace your dynamic site with the static copy, but your visitors are none the wiser. All URLs are the same, as well as the content returned. Might require some .htaccess trickery.

2) It would have to preserve all the images, css, and other assets, some possibly hotlinked. (The Wayback Machine is not awesome at this, understandably)