Hacker News new | ask | show | jobs
by nicops 3550 days ago
> If you need to a 'chill' with the original researcher to get critical details, that's not science anymore.

Certainly there is always the possibility that there was some details misunderstood, something that needs to be clarified, a print error, etc. Your "that's not science anymore" statement seems highly exaggerated. People are not supposed to communicate only via papers.

4 comments

Sure, it's inevitable that, in some cases, eventually someone will have to go back to check notes because some factor mattered when no one thought to write it down.

But there's a difference between that, vs "we expect that someone will have to make personal contact with the original researchers in order to replicate it".

If you're explaining away replication failures by such non-contact (as the quote is), that's confirmation of a problem (in keeping with the standards of science), not a vindication of the results.

There's an additional danger of making it so that "you can't replicate until you have social contact with the original researchers". That way lies favoritism: it's harder to criticize someone as you get closer to them socially, and they can withhold the capability of criticism by not engaging the critics.

Here's an example from my field, which I think is informative because it gives a concrete idea of how this plays out in a public arena.

tl;dr for this wall of text: 1) Authors A describes algorithm, 2) Author B publishes counter-example to show where #1 fails, 3) Authors A say it wasn't wrong, but that the author of #2 'misunderstood', and B should have contacted them first; and in any case here are the missing details, 4) Author B points out that paper #1 should have said those details were missing. 5) Authors C point out that authors A misunderstood many things in their own publications; authors A can't complain about others not contacting them first when they don't do it themselves.

"Canonical Numbering and Constitutional Symmetry" (1977), DOI: 10.1021/ci60010a014 describes an algorithm.

"Erroneous Claims Concerning the Perception of Topological Symmetry" (1978), DOI: 10.1021/ci60014a015 points out examples where the algorithm from the first paper, and from another paper, don't work.

The authors of the first paper followup with "On the Misinterpretation of Our Algorithm for the Perception of Constitutional Symmetry" (1979), DOI: 10.1021/ci60017a012 .

> A recent paper in this journal contained critical comments on two methods for the perception of topological symmetry. Carhart’s claim that our algorithm does not correctly perceive topological symmetry and fails with certain structures is the result of a misinterpretation of our algorithm.

> Unfortunately, the author did not contact us directly to help him clarify his misunderstanding. This failure is unusual and difficult to understand. Thus, it was not until we received the recent issue of this journal that we learned of this misinterpretation.

> In our paper we were particularly aiming at catching the interest of the organic chemist for the problems of uniquely numbering the atoms of a molecule. Therefore, we put particular emphasis on the criteria for determining priorities among atoms to enable the chemist to manually number the atoms of molecules according to our procedure. We restrained from giving all small details of the algorithm to keep the paper concise, working under the assumption that persons interested in the details would contact us directly. It is astonishing that Carhart at the point where we did not fully elaborate on the details works with the premise that we misconceived the problem. Initially one should rather assume that other people, too, understand a problem. Only if explicit errors are found should one digress from this conviction.

Carhart followed up with a letter to the editor, "Perception of Topological Symmetry" (1979) DOI: 10.1021/ci60017a600 :

> I am delighted to see that my critique’ appearing in this Journal has encouraged C. Jochum and J. Gasteiger to present previously unreported steps in their algorithm for the canonical numbering of chemical graphs. They refer to these steps as “small details”, but in fact they are the very essence of any routine which reliably finds unique numberings for, ...

> However, I did not misunderstand their previous article (unless lack of clairvoyance can be classed as misunderstanding); I simply took it at face value. My critical comments, and the counterexamples I presented, were completely appropriate in the context of that article. In contrast with their latest offering, Jochum and Gasteiger’s previous paper did not present a sound and accurate definition of constitutional symmetry, nor did it indicate in any way that crucial steps had been omitted. I am sympathetic with the problems of describing a complex algorithm in the limited space of a journal article, but if space limits the development of a fundamental concept, it is the responsibility of the author to say so, and to indicate that a reader must obtain additional information before he tries to implement the described procedure.

It ended with a letter from still other people writing another letter to the editor, "Canonical Numbering" (1979), DOI: 10.1021/ci60019a600 :

> We have been following with some interest the controversy appearing in this Journal regarding canonical numbering and various types of The first article by Jochum and Gasteiger contains a number of incorrect and misleading statements about both their work and the work of those who preceded them. ...

> Jochum and Gasteiger also strongly implied that they had a “simple” algorithm which gave complete partitioning, eliminating the need for a comparison step. Carhart correctly pointed out that this was not the case. Subsequent publication of the details of Jochum and Gasteiger’s indicated that it does contain a comparison step ...

> On a more general level Jochum and Gasteiger complain that Carhart did not contact them “directly to help him clarify his misunderstanding”. Yet it is obvious from the large number of misinterpretations and/or misrepresentations which appear in their work that they made no attempt to clarify their misunderstandings by discussing such matters with the original authors. Publishing last on a particular subject accords one considerable power, power that carries with it the responsibility to treat the preceding work with fairness and objectivity.

>People are not supposed to communicate only via papers.

A paper and its supplementary materials are supposed to be enough to reproduce the experiment. In practice, this often fails, but that is a fault in the scientific process. Science isn't just about empirical knowledge, it's about public and redundant empirical knowledge, as opposed to losing important knowledge of the natural world when the original investigator gets hit by a bus.

Wouldn't those problems in the scientific process get corrected more easily if you contacted the original author to see if there are any details that were missed and then publish those details with your results instead of just publishing a paper that says "Nope, couldn't reproduce"?
No, you publish the results that you cannot reproduce.

Then maybe the next generation of researchers documents their work better.

Or maybe the original researcher publishes a v2 edition of their paper.

> People are not supposed to communicate only via papers.

That we use written communication that persists through generations is the basis of science and society in general. If we cannot communicate sufficiently via papers, we're in a world of trouble.

I used to hold this opinion, but my experience with academic research changed my mind. Much of the scientific knowledge we have is passed from generation to generation by mentoring. The amount of knowledge is so vast, and our means of searching the written literature for relevant facts so poor, that when I want to learn something or solve a specific problem there is no substitute for a discussion with an expert in the field.

The core problem is that human communication is very difficult. It becomes even more difficult when we try to communicate ideas without interaction, as we do when writing a book and expect someone to read and understand it. If I read a paper and I can't understand a sentence, it might take me days to figure out what's going on by myself, whereas asking an expert might yield an answer in less than an hour (sometimes minutes). The difference is really orders of magnitude.

There are whole fields that have effectively died because no one works on them any more. That knowledge doesn't live in anyone's mind. All the literature is there, but actually acquiring that knowledge by reading the literature is incredibly challenging and time consuming.

I have come to believe that the main purpose of hiring scientists in academia is to keep knowledge alive and have it passed on to future generations. Advancing research is of secondary importance. In fact I would say that most new research I see probably has no intrinsic value. I include my own research in this category. We have researchers solving esoteric problems of no value to anyone besides their own personal entertainment. Except, working on such research keeps our neurons firing and keeps knowledge alive. It is a well known phenomenon that taking a break from research very quickly leads to a sort of decay of memory. Our learned ideas and the connections between them wither away without constant reinforcement. In order to keep knowledge alive we have to engage in research, even if it seems pointless.

>I have come to believe that the main purpose of hiring scientists in academia is to keep knowledge alive and have it passed on to future generations.

Then these scientists should be devoted to producing textbooks and courses which can then be taught to non-research students. Yes, all knowledge about the scale of what a single individual knows (and keeps on their shelves, hard drives, etc) is embodied as communities and traditions, but we still get far greater redundancy of that knowledge from teaching it as undergraduate or master's-level coursework than from passing it down only via research mentoring.

If 25% of the population gets an undergraduate degree, 11% or so gets a postgraduate degree, and only about 1.7% get a PhD, then we need to be embodying society's knowledge among the larger cohorts for that knowledge to survive. We can't afford to live in a world where only 1.7% know how things work.

> Then these scientists should be devoted to producing textbooks and courses ...

Textbooks and courses exist for everything but the most cutting edge stuff (which are still in flux anyway), but they are a very inefficient way of transferring knowledge. I would say they are practically useless without expert guidance. At the most basic level, there are so many of them that an expert has to tell you which ones are both good and relevant to what you want to learn. I've once seen a student waste months of his life studying a book he thought was relevant, only to discover that book wasn't building towards the sort of knowledge he needed in that subject. The book was about the correct subject, but was focused on somewhat different aspects than the ones he was interested in. There was no way for him to know this in advance without guidance.

So we don't know how to organize existing books. Also, even the books that exist are usually pretty bad at conveying knowledge. Or perhaps humans are just pretty bad at learning things from books. Either way, no one knows how to write textbooks and courses that are much better than what we have today. I really don't know of a better way to preserve knowledge than the current one. Perhaps technology can improve the situation by making access to knowledge more interactive. But I suspect this would require a real breakthrough.

> We can't afford to live in a world where only 1.7% know how things work.

Why not?

I have a concrete counterexample. Let's say I write a paper presenting a model, plus some numerical results of large simulations. The code is based on gluing together various pieces of open source code. All these codes are typical scientist codes that are held together with duct tape. My paper is short, but I spent a lot of effort munging things together, and I'm fairly certain nobody can reproduce my results without my source code (preferably the whole environment) unless they spend a lot of time on trial and error like I did.

The tweaks I did to glue things together has no theoretical value and don't belong in the paper. As a practical matter, I can't fit a lot of source code into short paper format.

What do?

Open source the code and supporting data.
It's not that simple. What if some of it is proprietary? What if I'm not allowed to submit code because I need to be anonymous so reviewers can maintain impartiality? What happens when one of the upstreams update and breaks my code? Do I need to keep it updated? Forever?
At my institute at least, scientists are required to maintain everything that is necessary to reproduce a result for at least ten years. That includes all the data and the software used to produce the results. It's not an easy job, but it's important.
If your institute also mandates they make the data/software publicly available, then that's definitely the exception rather than rule. Also must be hideously expensive.

It almost never happens that a paper I read actually comes with usable source code.

Then your results are not reproducible and your conclusion is suspect.
A lot of thought has gone into such questions. For example, see the guidelines at https://www.epsrc.ac.uk/about/standards/researchdata/
> Let's say I write a paper presenting a model, plus some numerical results of large simulations. The code is based on gluing together various pieces of open source code. All these codes are typical scientist codes that are held together with duct tape. My paper is short, but I spent a lot of effort munging things together, and I'm fairly certain nobody can reproduce my results without my source code (preferably the whole environment) unless they spend a lot of time on trial and error like I did.

Then leave out the results since they are just an anecdote. If you want to include experimental results then it has to be done in a scientific fashion.

The usefulness of a paper that doesn't stand on its own is rather limited, though.
All papers should start with a dictionary? No clearly not, so there's always going to be some assumed knowledge - words change their meaning and have different meaning to different people so we're already on to a loser just with the medium we're using.

So, the possibility of things like, say, a researcher not mentioning something that is standard practice in their lab that later is found to be a crucial part of the setup for an experiment seems high. But just like you don't want to provide a dictionary of standard terms with a paper you don't want to provide a list of the chemicals used to mop the floor, or a list of the lumen and colour temperature ratings of lights in the fume cupboards, or ...

IMO if a paper is not reproducible then yes it should be published but also the original team producing the paper should be challenged to reproduce the results. It's not a fight, we're all on the same team - work with them and try to find the reason for the lack of reproducibility.

> So, the possibility of things like, say, a researcher not mentioning something that is standard practice in their lab

I'd suggest a different formulation: "standard practice in their field"

Standard practice in general cooking? That's ok. Standard practice in my kitchen? That's a problem.

The research is IMO like a meal recipe a knowledgable chef should be able to reproduce.

Though it is understandable why one would forget to mention something. Especially if they thought it was general practice to do something their way.

Maybe a paper does stand on its own, with the large list of citations at the end. But, maybe some of those citations are journals that your institution doesn't subscribe to, or are historical and in another language, or et cetera.

There is a page limit to publications in high impact journals, and generally it's not great practice to utilize the limited space on the details of hurdles overcome.

I would argue that some of the most important papers in science don't really stand on their own... they need context and expertise that the paper can't and shouldn't cover.