Hacker News new | ask | show | jobs
by jfaucett 2991 days ago
> These programs tend to be both so sloppily written and so central to the results that it’s contributed to a replication crisis, or put another way, a failure of the paper to perform its most basic task: to report what you’ve actually discovered, clearly enough that someone else can discover it for themselves.

This is the crux of the of the problem IMHO - at least for the fields I study (AI/ML). Replicating the results in papers I read, is way harder than it needs be, i.e. for these fields it should just be fire up a jupyter notebook and download the actual dataset they used (much harder than it seems to actually get your hands on). Very few papers actually contain links to all of this in a final polished manner so that it's #1 understandable and #2 repeatable.

Honestly, I'd much rather have your actual code and data that you used to get your results than read through the research paper if I had to choose (assuming the paper is not pure theory) - but instead there is a disproportionate focus on paper quality over "project quality" at least IMHO.

I don't really know what the solution is since apparently most academics have been perfectly fine with the status quo. I feel like we could build a much better system if we redefined our goals, since I don't think the current system is optimal for disseminating knowledge or finding and fixing mistakes in research or even generally working in a fast iterative process.

10 comments

I've had a paper peer reviewed. It was ultimately rejected but I can't help but suspect that by making all my code publicly available, I hurt my chances of publication. The reviewers comments were about my coding style, my choice of build tool (I didn't use make, but something else which is just as easy to use), the choice of C vs C++...

It's like best practices for computer security -- always strive to minimize the attack surface. :) Without source code there is much less stuff to criticize!

>The reviewers comments were about my coding style, my choice of build tool (I didn't use make, but something else which is just as easy to use), the choice of C vs C++...

I don't know precisely what field you submitted in, but this is maliciously bad reviewing practice. You should have submitted a rebuttal and written to the editor calling the relevance of such "reviews" into question.

Yeah, just reading about that made my blood boil. Especially if it was a biology paper, I don't know what I would have done...
> It's like best practices for computer security -- always strive to minimize the attack surface.

I suspect that's also why some papers are unnecesarely verbose and describe simple things as complicated as possible. Can't criticize something that can't be understood.

It's unfair that your comment is downvoted because it's spot on.

Hiding code, obfuscating language, fudging data, all are symptoms of the same problem: of being interested in getting paper on cv instead of doing research.

There are many circumstances that can put even a good scientist in a situation where he/she has to do this but that's not a good argument for not sharing the code.

Then why submit it for peer review at all?
Because we need peer reviewed papers on our CVs!

I also detest simple things made complex, though. In my experience (with has covered electronics, epidemiology and geography) reviewers tend to pick up on obtuse issues in text but miss glaring errors in the math. It's sad, and you can see why someone less than scrupulous would exploit that tendency by over complicating things. That said I think plenty of authors are honest but just not very clear thinkers!

In machine learning / computer vision people often release their code after the paper is already accepted.

Time before the submission deadline is usually used to do more experiments and write text, not to polish the code. And after the deadline there is no hurry. What people (who want to share code) consider important is to release it till a bit before the actual conference (but this doesn't transfer to journal-based fields).

The paper should be accepted in rough form, but publication should be held up pending approval of the data and code.

Or a paper should be published in a probationary form, and not certified (by the journal) until an independent lab replicates the result. A paper that isn't making adequate progress toward replication should be retracted by the publishing journal.

That's terribly unfortunate.

I'm entirely open to being shown otherwise, but working in that field seems more akin to working in physics than in software engineering (those are engineering problems, after all, not computer science; and sometimes they're only opinions)— and being critiqued for that in an ML/AI paper would be like critiquing a physics paper over the author's coding style— it is misdirected, IMO.

They could be squashing some legitimately good work by being too heavy handed around coding style and build process.

This is why artifact reviews should be separated from publication review. Artifact reviews are notoriously horrible, with reviewers inexperienced in it simply bike shedding and picking at straws. At best, artifact reviews should simply be checking for reproducibility (i.e. can they get the code to run at all).
I'm very surprised to hear this: I've submitted several artefacts; co-run an AEC for a (small-medium) conference; and spoken to a lot of people about it. I've heard virtually nothing negative until your post. Indeed, the artefact reviews I've received have nearly all been thorough and considered (one review was slightly nitpicky, but that's one review out of 10-12). For paper reviews, on the other hand, I'm very happy if 1/2 of reviews are thorough and considered. Bear in mind that most of the artefact reviewers also implicitly review the paper, and you get some idea of how good a job they do.

My main bugbear with the whole thing is the incorrect spelling of "artefact". And when that's my main bugbear... well, things aren't too bad!

I recently reviewed a computational materials science paper and was quite impressed by the fact that they included some data and a Jupyter notebook. Long term, ecosystem will be an issue, but in the short term, it's invaluable. It does make it easier to check for obvious errors. I think more incentives should be given by funding agencies to encourage this.

I'm really sorry to hear about your experience.

I don't work primarily in computer science, but rather in math/physics, and both as a reviewer and as an author, I have only seen a positive impact for sharing code. When I review, if code is made available, it is easy for me to see the details of a model or a calculation, which I really appreciate. When I am writing a paper and developing a model, knowing that I will make my code available ensures that I write things in a clear, transferable, and understandable way (which ultimately ends up being quite beneficial to me).
Then do we even need peer review? In my experience it is always superficial, people just feel that they have to say something, so they say something about writing style, or similar trivia.

The way it should work is you put your stuff with code and all data on github. People interested in the field or working for journals read it, and rate it, journals collect links to paper repositories that are highly rated by scientists who have many highly rated papers in the field, and call that publication.

I've been peer reviewed once (and waiting for the second) and it was very in depth, giving me a couple of pointers to improve my paper. Field was mathematics, though.
sure it depends on field, on journal and on reviewer, and but with a github like interface, and public reviews it will only get better.
Problem is, if the review is public then it means the article is also in the public and some (most) publishers are not OK with that. At least yet, hopefully it will get better.
I don't know what field you work on but this would be very atypical in mine.

While it's true that minimizing the attack surface is something that can work in papers, in my field reviewers typically don't look at the code. Many of my papers include code or links to it, and I haven't ever had a comment about it in reviews.

Aaah, brought me memories from the 7 years I was in academia. All the publish and perish and the peer reviewing process is completely broken. Academia is completely broken, I would hate having to go back to academia now that I have been 7 years in industry and earning 6 figures.
Nah, they probably used your code to scoop you.
I also work in AI/ML field (deep learning), and usually I don't care if the paper has corresponding code or not. I read papers to find good ideas. If I find it, I can implement it myself. I rarely need more than a couple of days to test an idea (e.g. Hinton's capsules model took 4-5 hours to implement). The benefits of own implementation should be obvious.

If something important is missing or does not make sense, I usually just email the first author. Usually they respond within a couple of days, and unlike looking at code, I can also get an explanation of why they did it that way.

In fact, I don't even usually care that much about stated results (such as improvements in state of the art).

Things that matter are: deep insight into a problem, new angle to look at something, discovery of a new phenomenon, high quality explanation, practical tricks to save resources, and comprehensive prior/related work review. That's why I read papers.

this is the right way to go about things if you have certain goals, for sure.

sometimes you need to replicate exactly the same training method, on exactly the same data — for instance if you want to use it as a baseline on a known dataset. then it becomes really important to have the code, because while an adequate replication might be easy, it takes a lot of trial and error to get perfectly the same model.

Sure, but if the code for some result is not available, I feel free to report whatever result I got implementing their method. I’m also perfectly fine with using “couldn’t reproduce” phrase in my papers.
You seem to be exceptionally well funded, and/or have few deadline constraints. Your strategy will only work until you get spammed with "good ideas".
You seem to be exceptionally well funded, and/or have few deadline constraints

I wish! :)

you get spammed with "good ideas"

Again, I wish!

In the subfield I'm focused on at the moment (efficient mapping of NN algorithms to specialized hardware, low precision computation, model compression) I don't see good ideas very often (fewer than one good paper a week). Previously I worked on music generation - also didn't really feel spammed with good ideas.

I don't mean this to be adversarial, but what exactly is it you do that would not be sped up by checking someone else's results directly before fiddling around and then trying out your own implementation?
But that's my point: their results are not that important to me.

As an example, recently I saw a paper on NN weight quantization, which had a very interesting idea, but the results were not impressive. I don't remember if they had any code published or not, but it didn't matter - I wanted to see what kind of results I'd get if I implemented it. Turned out it works really well, much better than what they reported in the paper.

Here is an idea: inverse dropout.

How would you implement that?

What your preferred software to implement these? A framework like chainer, or purely in numpy/MATLAB?
Tensorflow or Pytorch. Plain Numpy for quick prototyping/testing. Sometimes have to write/modify Cuda kernels.
I would not say that most academics are "perfectly fine with the status quo". But I would say that most academics have enough competing interests taking their time away from research that they're uninterested in taking on another one with such uncertain payoff.

In a way bringing about the kind of change you reference in scientific publishing would actually be a pretty significant research accomplishment -- the field would be that much better for your efforts! But the road to get there is filled with political wrangling, talking to and serving on committees, probably forming dedicated organizations and painstakingly getting buy-in. This is not something you can realistically achieve without probably a good career's worth of political capital in your field and the drive and people skills to make it happen.

Until it does happen, making your own lab adhere to these standards is admirable but with unfortunately limited upside. I'm not saying the status quo is good, just that there are reasons for it still being the status quo.

>"Honestly, I'd much rather have your actual code and data that you used to get your results than read through the research paper if I had to choose (assuming the paper is not pure theory) - but instead there is a disproportionate focus on paper quality over "project quality" at least IMHO."

I think at first new students know this wrong but then get dragged into the circular logic of:

  it is standard in the field -> it is ok -> it is standard in the field
It starts with just being so busy and confronted with so many new things that you just use the standard behavior as a "stand-in" (no pun intended) for a rational approach. Then you never have the time to go back and reassess that decision.
> I don't really know what the solution is since apparently most academics have been perfectly fine with the status quo.

Simple - change the incentives. Currently, academics are evaluated based on paper publications not "actual code". If you want code and data to be shipped, create enough incentive for them and you'd see the change.

> there is a disproportionate focus on paper quality over "project quality"

One problem is bitrot. Stuff that runs now is not guaranteed to work in 1 or 2 years, let alone 10 years.

Even more so when it runs on fancy hardware, like GPUs.

This is one of the main reasons to require source release. Open source software is much more likely to run in 10 years. It’s actually useful to package everything together into a container or VM so all the packages are there too.

I work with some genome guys and they have this problem as their sequencers basically turn over in a year or two the advances are so fast. So they have to maintain the specimen as well as all the software versions they used for analysis. It’s a pain, but otherwise nothing is reproducible.

The work should reproducible from not just artifacts, but also from a container. Sourcing compilers, libraries, etc is almost impossible. The NSF should really be running an archive and cluster for housing reproducible research that remains executable far into the future.
I'm in opto/bio/eng. I think you misunderstand the 'real' reason for research papers as they currently stand: Money. It's a bit of a path, but I'll try and explain.

In the US at least, research costs a LOT of cash. Many departments are chronically underfunded. In my state, the university only gets ~10% of it's funding from the state-house. The rest is grants. The only real writers of grants are the professor corps. So, departments look to the professors to fund the enterprise. Some of my advisers spent about 40 hours per week just on grant writing, neglecting the teaching and research hours required alongside. It is not a fun/good job. So most/all research is done by students, mostly PhD students, with little to no input from their advisers, and it's a stressful mess. As a result, most research is, well, amateur. Stats get mangled, code quality is non-existent, rats get loose, etc. Yes, yes, none of that 'actually' happens, but for real? It's a shitshow.

So, where does that leave the PhD student that has been in the program for 7 years? They may have one first author paper, if that, a thumb-drive filled with nearly unreadable 'data', and a dozen failed experiments. Failed experiments don't get published, mostly because science is hard and doing all the controls to say that you have a genuine/real failure is much harder. So the professor, now running into a very firm deadline to graduate the student via the grad office, must rush and publish something, just to get the student to leave. The professor's track record in graduating students is part of their evaluation, as well as their publication record. Hence, the unreadable graduation paper; one of two types of unreadable paper.

This paper is a targeted missile that is meant to do one thing: get the student off the payroll. It is not meant to be good, or a viable piece of science. It is never meant to be replicated. It is trying to be obtuse. It is there just to graduate a student, nothing more, nothing less.

The other class of unreadable paper is the turf-war paper. These papers are also meant to be just readable enough, but not so much as to be repeatable. The reason is that the paper is a 'big' paper. What is published is meant to stake a claim in a 'big' area of the field. Hopefully this will guarantee more funding in the future as now that professor is a 'big' player in it. Hopefully no others can report that it is unrepeatable before the next grant comes in. The trick is make certain that the paper exposes just enough of the experimental design as to truly 'claim' the new big thing, but not enough that you can replicate at all. Karl Disseroth is infamous for this in the bio world. The paper creates jazz, but safeguards the turf of the lab from any other lab that may want to replicate it independently; they need the first lab to re-do it, and they must come with funding in hand.

So, to sum up: papers are weapons. One type is the missile that causes a student to graduate. The other is a trap with a golden idol on it.

This is spot on. I was surprised the first time I worked at a major university just how toxic the environment was and how little mindshare was spent towards actually contemplating compelling hypotheses / experiments. It was much less of the ideal "life of the mind" I thought it would be and much more like show business / social climbing, minus the widespread name recognition and glamor.

I was already on the way out of science when I started working at that job, but the publish or perish culture really accelerated my departure.

It's also interesting how the current incentives really warp the incentive structures not just at big research universities, but also at small liberal arts colleges. I grew up as a fac brat, and so I've been able to tune into a lot of dialogue about the latest crop of new professors coming in to replace older professors as they retire, and a lot of the older professors are genuinely shocked at how little emphasis the newer professors place on teaching (traditionally what SLACs have focused on) compared to research. Even at schools with around 2000 students, new professors are demanding generous starter packages that no one would really have thought to ask for in the 70s.

To be fair, it's been ~50 years since the 70s. The Professor Corps should pretty much be entirely different people.
From an ideal point of view, I agree with your criticism. Probably most honest academics would, as we all have had frustrations after spending a lot of time trying to reproduce someone else's research. But it is very difficult to solve this problem.

Peer-review takes a large amount of time from most academics, time that is totally unpaid. With the status quo, we are OK with that - it's a service we do to each other (we need our own papers reviewed, after all) and reviewing also has the advantage of finding new ideas sooner. Although precisely in AI/ML, many academics are currently complaining: due to the rapid expansion of the field, the peer-review load has gone beyond acceptable in many cases. For the last AAAI conference I had to review 6 papers in a not too long deadline. In the last 3 months I have reviewed like 40 or so papers, and I'm very far from being a top-tier star in my field, there's people who are probably getting much more review requests (although they're probably saying no to some if they want to keep sanity).

Reviewing code and data seriously can take, how long? I would estimate an order of magnitude more than reviewing a conventional research paper in PDF.

So currently, the situation is that if you post a link to source code you may get some positive reaction in the reviews, but in 99% of the cases reviewers are not going to actually look at the code (or at least not beyond a cursory look to see if it seems coherent at a first glance) because there is just no time.

Unless we fix this, I don't think we will see papers really focusing on the code and data, regardless of good intentions.

Have you seen OpenML? There are solutions for this, and I think most people would agree they are useful, it's just the change/adoption/standardization cost is high as always.
Seems like this could pair well with the journal crisis and suggestions to implement a blockchain journal: Your paper cannot be accepted by the journal unless it includes executable code; the results of which are then injected into the "paper" view...?

So, basically - a paper consists of what it takes to replicate the paper, and the blockchain journal's first step is running the replication.

This would be problematic for papers that require expensive computation, however...

So where, exactly, is a block chain required here? Everything you listed could just as easily be a requirement set by the journal, after all. I mean, every journal has at least some requirements already (at the very least, nearly all require publishing in a specific language). So aside from jumping on the blockchain bandwagon just because that's the new exciting thing, what value is added here?

Good God, I'm getting tired of every single thing needing use the magic word 'blockchain' ATM.

https://news.ycombinator.com/item?id=16737642

It actually seems like journals could benefit from the application of this technology.

So yes, you don't need a block chain to set these requirements, and if you're using a blockchain you don't need these requirements, _but_, a blockchain journal and these requirements would likely pair very well together, as they cover respective weak-points (centralized journals might only ensure the journal's publisher can replicate; decentralized journals have to have some kind of automated validation).

Buzzwords become buzzwords because there's something to them, after all.