| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TikiTDO 1760 days ago

I think this problem comes down to two core issues: discoverability and terminology.

You're going to be lucky if a paper from the 70s or 80s is available in a searchable database at all. That means someone bothered to scan it in, and OCR it since then. Even for the few papers that are searchable, they are old enough that they probably won't catch anyone's eye unless they are desperate.

Of course then there's also the problem of knowing what to search for. Programmers love to invent, reinvent, and re-reinvent terminology. It's only gotten worse with every other developer running a blog trying to explain complex ideas in simple terms.

The entire field of ML is a perfect example of this. I remember talking to my father about all sorts of new developments in ML back in the early 2010s, and I was quite surprised when he told me that he learned a lot of the things I was talking about back in the 80s just named a bit differently.

In most cases it ends up being a question of how much time you can put into any given problem. If I spend two weeks to find a paper that would have taken me a week to reinvent, then am I really ahead? If the knowledge wasn't important to enough make it into textbooks/classes/common knowledge then attempting to find it is akin to searching for a particular needle in a pile of needles.

1 comments

amw-zero 1760 days ago

I have never come across a popular CS paper that was not available on the web, for what it’s worth. Maybe some of the lesser known papers are lost, but all of the important ones, such as Codd’s writing, are very easily accessible with simple search engine searches.

TikiTDO 1760 days ago

The important and popular ones are absolutely available, but those are usually important because they have entered the realm of "common knowledge," at least in a particular sub-field. These are going to be at the top of the list when it comes to digitizing useful historic records. It's fairly easy to OCR a PDF, so as long as someone with some time decided "hey, this might be useful" then you'll probably be able to find it.

If you're doing databases then you've almost certainly been exposed to Codd's work, if not through his papers and books, then at least through textbooks and lectures. There are countless blogs, lecture series, and presentations that will happily direct you there.

The challenge is that there's also a mountain of work that never really got much popularity for whatever reason. Say a paper was ahead of it's time, or was released with bad timing, or simply kept the most interesting parts until the end where few people might have noticed. It's these sort of gems that are hard to find. It's hard to even know how many of these there are, because they are by definition not popular enough for most people to know about them.