Hacker News new | ask | show | jobs
by Steuard 3008 days ago
It's great to see some Usenet archives out there to partly make up for the disappointment of Google Groups. But I'm sad that this archive seems to be incomplete, even within its stated date range. Back in the day, I was active on rec.arts.books.tolkien and alt.fan.tolkien: in this archive, I can't find any trace of the massive "alt." hierarchy at all, and the list of files for the "rec." hierarchy doesn't include the Tolkien group. For that matter, the list includes rec.humor.funny and rec.humor.d and others, but apparently not rec.humor itself. (It really does make you appreciate just how substantial the effort of collecting a comprehensive Usenet archive would be.)

On another note, not that anyone here would be able to fix it, but this list would be a lot easier to search through if the item names didn't all begin with "Usenet newsgroups within", so you could jump to first letters in a meaningful way.

2 comments

It's my fault (not the IA), I thought I'd got all newsgroups but must have missed some. I just checked the main newsgroup list and it's incomplete, for some reason.

My plan though, is to dust off the old code, get a complete list of groups, get them, and then make it searchable.

Sorry about that, I didn't check. This was all done in 2013. Basically, I wanted to build a search engine but indexing the newsgroup posts (for header and text body search) would take too long. I abandoned it, then in 2016 I sent it to the IA.

So, I only just found out it's lacking a bunch of groups, 5 years on...

This is much appreciated....:-) I should like to note that in the case of the comp.sys.amiga.* news groups, that these were first established Jan 8, 1991. The first posts in the mbox archive for comp.sys.amiga.programmer, however, are from May 31, 1994. It looks like the first three years might be missing, at least for this group. I haven't yet checked the other amiga groups yet.
No apologies necessary: creating the archive in the first place was awesome! Like I said, this just shows how massive and complex Usenet is (or maybe, was), and how it's not easy at all to create a comprehensive archive. It's far better to have some of it than none of it!
From what I can gather, I downloaded one tenth of Usenet - 11,000 groups. This means if I'd done all 110,000 groups it would have taken me about a year to download them, and an 8TB drive to store them (in 2013!). That wasn't really feasible...
It looks like the downloads are grouped by top level group.

So you can go to https://archive.org/download/gna-rec and see all archived groups that are in the rec hierarchy.

Right: As I said earlier, the list in this archive includes rec.humor.funny but not rec.arts.books.tolkien. (And I seem to recall that the alt.* hierarchy was a little less broadly propagated, which might possibly be related to why it wasn't included here.)
It looks like alt is missing, sadly.
There are also groups within an existing hierarchy missing. For example. rec.games.roguelike.development is there but rec.games.roguelike.nethack is missing.