Hacker News new | ask | show | jobs
by Ar-Curunir 535 days ago
This is much too negative. Peer review indeed misses issues with papers, but by-and-large catches the most glaring faults.

I don’t believe for one moment that the vast majority of papers in reputable conferences are wrong, if only for the simple reason that putting out incorrect research gives an easy layup for competing groups to write a follow-up paper that exposes the flaw.

It’s also a fallacy to state that papers aren’t reproducible without code. Yes code is important, but in most cases the core contribution of the research paper is not the code, but some set of ideas that together describe a novel way to approach the tackled problem.

6 comments

I spent a chunk of my career working on productionizing code from ML/AI papers and huge part of them are outright not reproducible.

Mostly they lack critical information (missing chosen constants in equations, outright missing information on input preparation or chunks of "common knowledge algorithms"). Those that don't have measurements that outright didn't fit the reimplemented algorithms or only succeeded in their quality on the handpicked, massaged dataset of the author.

It's all worse than you can imagine.

That’s the difference between truly new approaches to modelling an existing problem, or coming up with a new problem. No set of a bit different results or missing exact hyperparameter settings really invalidates the value of the aforementioned research. If the math works, and is a nice new point of view, its good. It may not even help anyone with practical applications right now, but may inspire ideas further down the line that do make the work practicable, too.

In contrast, if the main value of a paper is a claim that they increase performance/accuracy in some task by x%, then its value can be completely dependent on whether it actually is reproduceable.

Sounds like you are complaining about the latter type of work?

> No set of a bit different results or missing exact hyperparameter settings really invalidates the value of the aforementioned research.

If this is the case, the paper should not include a performance evaluation at all. If the paper needs a performance evaluation to prove its worth, we have every right to question the way that evaluation was conducted.

I don't think theres much value in theoretical approaches that lack important derivation data either, so no need to try to split the papers like this. The academic CS publishing is flooded with bad quality papers in any case.
I spent 3 months implementing a paper once. Finally, I got to the point where I understood the paper probably better than the author. It was an extremely complicated paper (homomorphic encryption). At this point, I realized that it doesn't work. There was nothing about it that would ever work, and it wasn't for lack of understanding. I emailed the author asking to clarify some specific things in the paper, they never responded.

In theory, the paper could work, but it would be incredibly weak (the key turned out to be either 1 or 0 -- a single bit).

Do you have a link to the paper?
+1
Anecdotally it is not. Most papers in CS I have read have been bad and impossible to reproduce. Maybe I have been unlucky but my experience is sadly the same.
> by-and-large catches the most glaring faults.

I did not dispute that peer review acts as a filter. But reviewers are not reviewing the science, they are reviewing the paper. Authors are taking advantage of this distinction.

> if only for the simple reason that putting out incorrect research gives an easy layup for competing groups to write a follow-up paper that exposes the flaw.

You can’t make a career out of exposing flaws in existing research. Finding a flaw and showing that a paper from last year had had cooked results gets you nowhere. There’s nowhere to publish “but actually, this technique doesn’t seem to work” research. There’s no way for me to prove that the ideas will NEVER work —- only that their implementation doesn’t work as well as they claimed. Authors who claim that the value is in the ideas should stick to Twitter, where they can freely dump all of their ideas without any regard for whether they will work or not.

And if you come up with another way of solving the problem that actually works, it’s much harder to convince reviewers that the problem is interesting (because the broken paper already “solved” it!)

> in most cases the core contribution of the research paper is not the code, but some set of ideas that together describe a novel way to approach the tackled problem

And this novel approach is really only useful if it outperforms existing techniques. “We won’t share the code but our technique works really well we promise” is obviously not science. There is a flood of papers with plausible techniques that look reasonable on paper and have good results, but those results do not reproduce. It’s not really possible to prove the technique “wrong”, but the burden should be on the authors to provide proof that their technique works and on reviewers to verify it.

It’s absurd to me that mathematics proofs are usually checked during peer review, but in other fields we just take everyone at their word.

They aren’t necessarily wrong but most are nearly completely useless due to some heavily downplayed or completely omitted flaw that surfaces when you try to implement the idea in actual systems.

There is technically academic novelty so it’s not “wrong”. It’s just not valuable for the field or science in general.

I don't think anyone is saying it's not reproducible without code, it's just much more difficult for absolutely no reason. If I can run the code of a ML paper, I can quickly check if the examples were cherry-picked, swap in my own test or training set... The new technique or idea was still the main contribution, but I can test it immediately, apply it to new problems, optimise the performance to enable new use-cases...

It's like a chemistry paper for a new material (think the recent semiconductor thing) not including the amounts used and the way the glassware was set up. You can probably get it to work in a few attempts, but then the result doesn't have the same properties as described, so now you're not sure if your process was wrong or if their results were.

More code should be released, but code is dependent on the people or environment that run it. When I release buggy code I will almost always have to spend time supporting others in how to run it. This is not what you want to do in Proof of concept to prove an idea.

I am not published but I have implemented a number of papers to code, it works fine (hashing, protocols and search mostly). I have also used code dumps to test something directly. I think I spend less time on code dumps, and if I fail I give up easier. That is the danger you start blaming the tools instead of how good you have understood the ideas.

I agree with you that more code should be released.. It is not a solution for good science though.

Sharing the code may also share the incorrect implementation biases.

It's a bit like saying that to help reproduce the experiment, the experimental tools used to reach the conclusion should be shared too. But reproducing the experiment does not mean "having a different finger clicking on exactly the same button", it means "redoing the experiment from scratch, ideally with a _different experimental setup_ so that it mitigates the unknown systematic biases of the original setup".

I'm not saying that sharing code is always bad, you give examples of how it can be useful. But sharing code has pros and cons, and I'm surprised to see so often people not understanding that.

If they don't publish the experimental setup, another person could use the exact same setup anyway without knowing. Better to publish the details so people can actually think of independent ways to verify the result.
But they will not make the same mistakes. If you ask two persons to build a software, they can use the same logic and build the same algorithm, but what are the chances they will do exactly the same bugs.

Also, your argument seems to be "_maybe_ they will use the exact same setup". So it already looks better than the solution where you provide the code and they _will for sure_ use the exact same setup.

And "publish the details" corresponds to explain the logic, not share the exact implementation.

Also, I'm not saying that sharing the code is bad, but I'm saying that sharing the code is not the perfect solution and people who thinks not sharing the code is very bad are usually not understanding what are the danger of sharing the code.

Nobody said sharing the code "is the perfect solution". Just that sharing the code is way better and should be commonplace, if not required. Your argument that not doing so will force other teams to do re-write the code seems unrealistic to me. If anyone wants to check the implementation they can always disregard the shared code, but having it allows other, less time-intensive checks to still happen: like checking for cherry-picked data, as GP suggested, looking through the code for possible pitfalls etc. Besides, your argument could be extended to any specific data the paper presents: why publish numbers so people can get lazy and just trust them? Just publish the conclusion and let other teams figure out ways to prove/disprove it! - which is (more than) a bit ridiculous, wouldn't you say?
> Just that sharing the code is way better

And I disagree with that and think that you are overestimating the gain brought by sharing the code and are underestimating the possible problems that sharing the code bring.

At CERN, there are 2 generalistic experiments, CMS and ATLAS. The policy is that people from one experiment are not allowed to talk of undergoing work with people from the other. You notice that they are officially forbidden, not "if some want to discuss, go ahead, others may choose to not discuss". Why? Because sharing these details is ruining the fact that the 2 experiments are independent. If you hear from your CMS friend that they have observed a peak at 125GeV, you are biased. Even if you are a nice guy and try to forget about it, it is too late, you are unconsciously biased: you will be drawn to check the 125GeV region and possibly notice a fluctuation as a peak while you would have not noticed otherwise.

So, no, saying "I give the code but if you want you may not look at it" is not enough, you will still de-blind the community. As soon as some people will look at the code, they will be biased: if they will try to reproduce from scratch, they will come up with an implementation that is different from the one they would have come up with without having looked at the code.

Nothing too catastrophic either. Don't get me wrong, I think that sharing the code is great, in some cases. But this picture of saying that sharing the code is very important is just misunderstanding of how science is done.

As for the other "specific data", yes, some data is better not to share too if it is not needed to reproduce the experiment and can be source of bias. The same could be said about everything else in the scientist process: why sharing the code is so important, and not sharing all the notes of each and every meetings? I think that often the person who don't understand that is a software developer, and they don't understand that the code that the scientist creates is not the science, it's not the publication, it's just the tool, the same way a pen and a piece of paper was. Software developers are paid to produce code, so code is for them the end goal. Scientists are paid to do research, and code is not the end goal.

But, as I've said, sharing the code can be useful. It can help other teams working on the same subject to reach the same level faster or to notice errors in the code. But in both case, the consequence is that these others teams are not producing independent work, and this is the price to pay. (and of course, they are layers of dependence: some publications tend to share too much, other not, but it does not mean some are very bad and others very good. Not being independent is not the end of the world. The problem is when someone considers that sharing the code is "the good thing to do" without understanding that)

What you're deliberately ignoring is that omitting important information is material to a lot of papers because the methodology was massaged into desired results to created publishable content.

It's really strange seeing how many (academic) people will talk themselves into bizarre explanations for a simple phenomenon of widespread results hacking to generate required impact numbers. Occams razor and all that.

If it is massaged into desired results, then it will be invalidated by facts quite easily. Inversely, obfuscating things is also easy if you just provide the whole package and just say "see, you click on the button and you get the same result, you have proven that it is correct". No providing code means that people will redo their own implementation and come back to you when they will see they don't get the same results.

So, no, no need to invent that academics are all part of this strange crazy evil group. Academics are debating and are being skeptical of their colleagues results all the time, which is already contradictory to your idea that the majority is motivated by frauding.

Occams razor is simply that there are some good reasons why code is not shared, going from laziness to lack of expertise on code design to the fact that code sharing is just not that important (or sometimes plainly bad) for reproducibility, no need to invent that the main reason is fraud.

Ok, that's a bit naive now. The whole "replication crisis" is exactly the term for bad papers not being invalidated "easily". [1]

Beacuse - if you'd been in academia - you'd find out that replicating papers isn't something that will allow you to keep your funding, your job and your path to next title.

And I'm not sure why did you jump to "crazy evil group" - noone is evil, everyone is following their incentives and trying to keep their jobs and secure funding. The incentives are perverse. This willing blindness against perverse incentives (which appears both in US academia and corporate world) is a repeated source of confusion for me - is the idea that people aren't always perfectly honest when protecting their jobs, career success and reputation really so foreign to you?

[1]:https://en.wikipedia.org/wiki/Replication_crisis

That's my point: people here link the replication crisis to "not sharing the code", which is ridiculous. If you just click on a button to run the code written by the other team, you haven't replicated anything. If you review the code, you have replicated "a little bit" but it is still not as good as if you would have recreated the algorithm from scratch independently.

It's very strange to pretend that sharing the code will help the replication crisis, while the replication crisis is about INDEPENDENT REPLICATION, where the experience is redone in an independent way. Sometimes even with a totally perpendicular setup. The closer the setup, the weaker is the replication.

It feels like it's watching the finger who point at the moon: not understanding that replication does not mean "re-running the experiment and reaching the same numbers"

> noone is evil, everyone is following their incentives and trying to keep their jobs and secure funding

Sharing the code has nothing to do with the incentives. I will not loose my funding if I share the code. What you are adding on top of that, is that the scientist is dishonest and does not share because they have cheated in order to get the funding. But this is the part that does not make sense: unless they are already established enough to have enough aura to be believed without proofs, they will lose their funding because the funding is coming from peer committee that will notice that the facts don't match the conclusions.

I'm sure there are people who down-play the fraud in the scientific domain. But pretending that fraud is a good strategy for someone's career and that it is why people will fraud so massively that sharing the code is rare, this is just ignorance of the reality.

I'm sure some people fraud and don't want to share their code. But how do you explain why so many scientists don't share their code? Is that because the whole community is so riddled with cheaters? Including cheaters that happens to present conclusions that keep being proven correct when reproduced? Because yes, there are experiments that have been reproduced and confirmed and yet the code, at the time, was not shared. How do you explain that if the main reason to not share the code is to hide cheating?