Hacker News new | ask | show | jobs
by byuu 2274 days ago
Hello,

I removed the archive.is check, and requested the entire site to be crawled, which is now done. The entire site is mirrored here: http://archive.is/byuu.org

I also sent an e-mail to the archive.org staff requesting the exclusion to be removed.

I hope this will satisfy your request, and that you can sympathize with why I chose not to do this until retiring.

1 comments

Excellent! Though it seems there is still a user-agent check in place -- visiting your site with an archivebot user-agent just shows "403 Forbidden".

The emulator design articles you've written are interesting and valuable, and I'm glad that they will be preserved for future emulator developers to read.

Oh, sorry I forgot I also had user-agent checks in the code, good catch! archive.is checking was based on IP ranges, since they spoof their user agents as Chrome and use proxies. I removed that too now. All that should be left is a noindex tag on the 1,200 individual game pages at byuu.org/preservation (which I'm told is very important for web indexers; they're all auto-generated thin-content pages.) I believe both archive.is and archive.org ignore noindex anyway, and the full databases are on GitHub at icarus/Database. We should be all good now, but let me know if you find any other issues please.

I don't know how long it'll take for archive.org to remove the exclusion. Requesting it originally took a couple weeks. But they have my approval to start indexing it all again in any case.

I've already paid up for the next year of hosting, so the site should hopefully remain online at least that long, but I might not pay for it forever in the possible event I don't return, so if folks want it mirrored in even more places, now is the time.