Hacker News new | ask | show | jobs
by roenxi 712 days ago
I also think this is going to bifurcate scientific research. Communities that are willing to run AI over their knowledge base are going to develop a big advantage over those who don't.

I have a friend who applys research to businesses as a consultant. One of his biggest challenges is how to index all the papers and work out what is relevant to a particular topic. I don't know if the current generation of bots are up to the challenge but sooner or later ProfessorGPT will be perfect for that niche. Then journals that force human's to manually research through large numbers of papers will be massive albatrosses that hamper scientific progress.

2 comments

> Communities that are willing to run AI over their knowledge base are going to develop a big advantage over those who don't

This is debatable.

I've seen countless "AI on knowledge base" projects and all have been on a whole not that much better than just using ElasticSearch. Some aspects are better e.g. discovery but some aspects are worse e.g. accuracy, speed when you are looking for something specific.

I would argue that simply having a knowledge graph in front that can provide related papers for a topic would accomplish the goals better.

> Communities that are willing to run AI over their knowledge base are going to develop a big advantage over those who don't.

I have a hard time seeing this. If you're an academic or an industrial researcher, the hard part of the literature review isn't finding the relevant papers, it's digesting them--and in some fields (e.g., chemistry), replicating their results. If you're more an industry person trying to apply academic research, well in general, you probably want a good textbook synthesis of the field rather than trying to understand stuff from research papers.

From your second paragraph, it seems to me that you're thinking AI will help with the textbook synthesis step, but this is the sort of thing that as far as I can tell, current LLMs are just fundamentally bad at. To use a concrete example, I have been off-and-on poking at research into simplex presolving, and one of the things you quickly find is that just about everybody has their own definition of the "standard model", so to mix and match different papers, you have to start by recasting everything into a single model. And capturing the nuance of "these papers use the same symbols to mean completely different things" isn't a strong point of LLMs.

> If you're more an industry person trying to apply academic research, well in general, you probably want a good textbook synthesis of the field rather than trying to understand stuff from research papers.

That sentence there is what will probably be the wedge point that gives LLM-heavy communities an advantage. As LLMs improve, the question becomes "why shouldn't industry people apply academic research directly?".

> ... as far as I can tell, current LLMs are just ...

We're in the upswing of a new technology, it wasn't that long ago that interesting progress was a monthly or weekly occurrence. I'm not to phased about where we might be right now. Alibaba are one of the companies with every chance of pushing the state of the art forward and regardless of that that state is going to get pushed by someone.

To make an analogy, right now using a LLM filter to read the literature is like reading Scientific American or New Scientist - fun, interesting, entertaining and not always right on the detail.

Let's say, for example, you wanted to build your own cutting edge LLM - would you just ask an LLM on how to do so? Or would you need to do more, and would a simple literature/internet search be just as effective as a starting point?

Note that in my experience - when you are a world expert in some tiny area ( like when doing a PhD ), you realize that quite a large proportion ( ~50% ) of the stuff published in the area you really know about is either wrong in whole or part, and another good proportion doesn't really move the field on.

So back to the original question - how did OpenAI get a lead in LLM - the story I heard was they talked to leading academic's about who were the best people in the field and tried to hire them all.

ie to paraphrase Richard Feymann on the Emperors nose question - you don't really find out the true answer by averaging over loads of ill-informed opinions - much better to carefully examine the nose/data source yourself.

I wouldn't go so far as a sibling commenter and so that most academic research is irreproducible bullshit. But academic research does tend to be chewing-gum-and-baling-wire products that are meant to hold together just long enough to get the necessary results. The rate-limiting step of turning academic research into useful products is "let's flip through all the academic research to find interesting papers," it's "figure out how to make this very-barely-works academic product usable on anything other than the exact things they did for the results section."

And, to be blunt, I have never seen anyone pitch an AI project to do that. AI pitches, even today, are almost invariably solving problems that are already decently solved (search is essentially a solved problem). And most of their proponents have shown no willingness to the practitioners telling them what the actual problems they need better solutions for.

Industry people (usually) shouldn't apply academic research directly because the majority of peer-reviewed published papers are irreproducible bullshit. Of course there is an occasional jewel in the muck so industry people with the skill (or luck) to identify those can get a jump on their competitors.
Industry would not gotten to this stage in LLMs without academic. Your ignorance is not an excuse for spouting bullshit.