Hacker News new | ask | show | jobs
by amelius 3576 days ago
Isn't that exactly what search engines are for? I.e., ranking work based on a complex mix of both content value and authority?
1 comments

Yes, improvements in search will help, but the costs won't go down that easily. In Google's case, the initial seeding of page rank was quite manual. And then think of the cost of upkeep - people are trying to game search engines continuously, Google has to update its algorithms on a consistent basis, content farms have profited enormously at various points of time and needed to be literally programmed against, and finally Google guards the actual search algorithm closely.

In the research domain, solving these problems would actually be even harder (in my view). How do you know if you found the best paper, or just the paper which is the best match for your keywords? At least Google has a feedback mechanism - someone stays a long time on a given webpage if it is very relevant to what they are looking for. This is not a good metric obviously, it might happen on a research paper simply because it is too obscure :-)

Well, you could always look at citations. And I don't think they are as easy to game as links between general websites, because in research at least the authors publish using their own names (and you don't want to get a bad rep for gaming the system).
With the disclaimer that this is an anecdote and not data:

You would be surprised at how easily a winners win situation happens in research. The citation based search would reinforce it. And while the gaming may not be search engine focused, I think getting the best papers via algorithmic methods can omit the crown jewels through less insidious (but quite common) issues such as citation graphs which orient in the direction of the flow of funding.

But you say, maybe winners win for a reason. This is only personal experience, but the single most profound, creative paper I ever read during my years of research was written by a lone wolf (i.e. no collaborators) in a somewhat unknown institution who turned out to be a sort of one hit wonder. This person's h-index may very well have been exactly 1 at that time. I honestly think algorithmic methods of searching for literature would have skipped past that paper.

You could make the case, though, that a thorough literature survey should be as exhaustive as possible and not omit ANYTHING. Well, very few people are that thorough - and even when they are, there is a tendency of reading papers from the most popular authors first. I am just glad I did my work before the days of Google Scholar becoming the de facto starting point, and I did not have the bias of a pre-ranked list.

I think that is the actual fear: I was able to find this crown jewel precisely because the publishing process at that period was more centralized (although quite likely also less competitive), and that paper was eventually published at a pre-eminent conference - which is how it came to my attention. With a search-engine driven open access, I think this lone wolf would have had a harder time getting that fantastic piece of work in front of a big audience because many of the common signals would have been too weak.

With all that said, when open access becomes more pervasive, great search technology will be a big part of the cost reduction and I definitely look forward to that.

Good point, but if you make the comparison with the web, you can find the more popular pages using a search engine, and you can find those lesser known jewels by using services like HN :)
What is the paper you mentioned in the reply below? (Can't reply to that one)
I thought about adding it in my original comment, but I want to keep my account as anonymous as reasonably possible. Sorry about that!