Hacker News new | ask | show | jobs
by mailxplorer 3008 days ago
It's my fault (not the IA), I thought I'd got all newsgroups but must have missed some. I just checked the main newsgroup list and it's incomplete, for some reason.

My plan though, is to dust off the old code, get a complete list of groups, get them, and then make it searchable.

Sorry about that, I didn't check. This was all done in 2013. Basically, I wanted to build a search engine but indexing the newsgroup posts (for header and text body search) would take too long. I abandoned it, then in 2016 I sent it to the IA.

So, I only just found out it's lacking a bunch of groups, 5 years on...

2 comments

This is much appreciated....:-) I should like to note that in the case of the comp.sys.amiga.* news groups, that these were first established Jan 8, 1991. The first posts in the mbox archive for comp.sys.amiga.programmer, however, are from May 31, 1994. It looks like the first three years might be missing, at least for this group. I haven't yet checked the other amiga groups yet.
No apologies necessary: creating the archive in the first place was awesome! Like I said, this just shows how massive and complex Usenet is (or maybe, was), and how it's not easy at all to create a comprehensive archive. It's far better to have some of it than none of it!
From what I can gather, I downloaded one tenth of Usenet - 11,000 groups. This means if I'd done all 110,000 groups it would have taken me about a year to download them, and an 8TB drive to store them (in 2013!). That wasn't really feasible...