It's pretty sad that a bunch of people sent historical Usenet archives to Google, they imported them into Google Groups, and then... basically hid/lost access to it over time. I assume the data is in GFS somewhere but it may never see the light of day. But maybe we just need to shame them every decade: https://www.wired.com/2009/10/usenet/
Google Groups was great when they first acquired all those archives! It just got progressively worse and worse over time, and I've never understood why. (Stagnation I could understand, but how did the same searches get less effective?)
Google has a continual Red Queen's race between everyone's dependencies. I'm guessing they were lucky to keep it mostly working and avoid Reader's fate.
"We're deprecating X, please migrate to Y by this date."
I get that Google essentially walked back from an at least implied role as an information archivist. But I still don't really get why they so completely abandoned things like Google Groups given the truly minuscule resources to maintain them at some low level.
Nobody can get promoted for improvements like that. It would only garner goodwill with a tiny customer base and it’s not clear how that would translate to revenue or user growth on other products
Totally agree, but I kinda feel (and this is solely my opinion) that much as it's our collective responsibility to donate to worthwhile causes, gigantic tech companies should spend rounding-error money on good digital causes like this.
Google's original mission statement from 1998 was to “organize the world’s information and make it universally accessible and useful.” Things like Usenet archives are clearly part of that.
That is still apparently their mission statement but I agree the reality seems more along the lines of "deliver the best search results for users' immediate needs" [while making as much money from that as humanly possible]
Somebody (and most likely a team of somebodies) needs to own it. If nobody at the top is willing to champion the project and carve out a budget and people, then they need to fold it.
I wish Google would donate those logs to archive.org and maybe even some servers. Idk why some of these tech giants never donate to projects that care about the internet the very thing which drives their profits.
”You searched for Stockholm Syndrome Research. Here’s a bunch of Stockholm travel blogs! They’re just missing two of your keywords, but they’re really popular with other visitors”
Quotes still force the word to be included don't they?
Though I suppose that still doesn't do much if the algorithm has tossed out the matches you want.
Google's Usenet archive is too important to be owned by one company. My understanding is some Google engineers worked a few years ago to make sure Archive had a copy, but I can't verify that and I don't think it's online. Some of the earliest archives come from Eugene Spafford's collection which is readily online elsewhere, but Google did a lot of work cleaning it and of course has the DejaNews archives which are invaluable.
I'm amazed Google Groups still exists as a product, it seems abandoned internally and I expect to hear it's shut down any month now.
I miss dejanews - anyone remember that before Google bought them? They had full search, it's a shame Google let that go. Their search went back to pretty much the beginning of net news. Does that exist anywhere anymore?
At least some of the Usenet archives are still online in Google Groups and searchable. Can't vouch for completeness of the search index though, it looks pretty wonky for stuff from the early 1990s. Anyway, example working archive link I found via search: https://groups.google.com/forum/#!searchin/comp.os.minix/tor...
Don't be too hard on Google's acquisition of Deja. I wasn't working at Google at the time, but heard from many colleagues when I joined that the Deja acquisition was quite chaotic because Deja was just about to shut down entirely when Google picked it up rather than let it disappear. It's a shame they don't do better with the Usenet archive now, but it's clearly not Google's business anymore.
Dejanews was simply fantastic. When Google embraced and extinguished it and then later killed the discussion filter from its front page I finally realized the Internet as we knew it until the late 90s/early 2K was dead. It transitioned from an instrument I could use to find people exchanging genuine opinions on stuff to a way to inundate my search result with biased people promoting or selling that stuff.
I remember dejanews, but don't remember liking the experience. I remember being very excited when Google acquired them.
I remember being super bummed when bitrot set in. Posts I knew existed just couldn't be found.
As someone who got their first car running by way of Usenet archives (a 69 Beetle), I always loved how well it was archived compared to all the web bulletin boards that fragment so much information as they quickly rot year after year.
For all of you who are misty-eyed pondering those wonderful, probably-lost-forever Usenet posts from the 90s, I dare you to read Kibo's .signature (last updated 5/5/94 4:52AM <-- CINCO DE MAY-O !!!!) at http://archive.birdhouse.org/etc/kibosig.txt to cure yourselves of misplaced misty-eyed nostalgia for those long-ago times when the Internet was something else.
Yes, but much of the Wayback Machine’s reddit content was specifically targeted and scraped by ArchiveTeam, who are volunteers that seek out at-risk content from the web and make sure that it gets into the Wayback. In the past few years we’ve specifically tried to go after sub-reddits that we thought were newsworthy and/or at high risk for deletion. But there’s no way we can get all of it.
Source: am ArchiveTeam member, run various pipelines, have scraped sub-reddits ranging from The_Donald to the cryptocurrency worlds to darknet markets.
It's great to see some Usenet archives out there to partly make up for the disappointment of Google Groups. But I'm sad that this archive seems to be incomplete, even within its stated date range. Back in the day, I was active on rec.arts.books.tolkien and alt.fan.tolkien: in this archive, I can't find any trace of the massive "alt." hierarchy at all, and the list of files for the "rec." hierarchy doesn't include the Tolkien group. For that matter, the list includes rec.humor.funny and rec.humor.d and others, but apparently not rec.humor itself. (It really does make you appreciate just how substantial the effort of collecting a comprehensive Usenet archive would be.)
On another note, not that anyone here would be able to fix it, but this list would be a lot easier to search through if the item names didn't all begin with "Usenet newsgroups within", so you could jump to first letters in a meaningful way.
It's my fault (not the IA), I thought I'd got all newsgroups but must have missed some. I just checked the main newsgroup list and it's incomplete, for some reason.
My plan though, is to dust off the old code, get a complete list of groups, get them, and then make it searchable.
Sorry about that, I didn't check. This was all done in 2013. Basically, I wanted to build a search engine but indexing the newsgroup posts (for header and text body search) would take too long. I abandoned it, then in 2016 I sent it to the IA.
So, I only just found out it's lacking a bunch of groups, 5 years on...
This is much appreciated....:-)
I should like to note that in the case of the comp.sys.amiga.* news groups, that these were first established Jan 8, 1991. The first posts in the mbox archive for comp.sys.amiga.programmer, however, are from May 31, 1994. It looks like the first three years might be missing, at least for this group. I haven't yet checked the other amiga groups yet.
No apologies necessary: creating the archive in the first place was awesome! Like I said, this just shows how massive and complex Usenet is (or maybe, was), and how it's not easy at all to create a comprehensive archive. It's far better to have some of it than none of it!
From what I can gather, I downloaded one tenth of Usenet - 11,000 groups. This means if I'd done all 110,000 groups it would have taken me about a year to download them, and an 8TB drive to store them (in 2013!). That wasn't really feasible...
Right: As I said earlier, the list in this archive includes rec.humor.funny but not rec.arts.books.tolkien. (And I seem to recall that the alt.* hierarchy was a little less broadly propagated, which might possibly be related to why it wasn't included here.)
There are also groups within an existing hierarchy missing. For example. rec.games.roguelike.development is there but rec.games.roguelike.nethack is missing.
Usenet was my very first exposure to internet communities. I read and posted on alt.games.nintendo.pokemon between ages 9 and 13, then moved to Something Awful after that.
I'm gonna put my dinosaur hat on and remind everyone that you can still use USENET in 2018. It's alive, well, and if you don't need access to binary groups, it's also free and very straightforward.
There are still some intresting discussions going on, mostly in the technical groups.
I still open it up maybe once a month or so for nostalgia, though I haven't posted in a while.
Usenet article periodically get promoted to the front page. I wonder what that says about the age demographics of HN, given that Usenet hasn't been significant in maybe 20 years (and even then it was a niche). Is the younger generation here? And if not here, where?
I'm kind of offended that you seem to think young people can't be aware of things that happened before their time.
EDIT: To elaborate a bit, my expectation would be that the sort of young person using HN leans more historically interested than normal. Is more likely to appreciate the value of things like Internet archive, etc. Usenet is a huge part of digital history, and the fact that it's not available even though archives were kept is something of a tragedy.