Hacker News new | ask | show | jobs
by sillysaurusx 1820 days ago
I wasn't sure whether to post my edit as a separate comment or not, but I significantly expanded my comment just now, that helps explain my position.

I'd be very interested in your thoughts on that position, because if it's mistaken, I shouldn't be saying it. It represents whatever small contribution I can make to fellow new ML researchers, which is roughly: "watch out."

In short, for two years, I kept trying to implement stated claims -- to reproduce them in exactly the way you say here -- and they simply didn't work as stated.

It might sound confusing that the claims were "simply wrong" or "didn't work." But every time I tried, achieving anything remotely close to "success" was the exception, not the norm.

And I don't think it was because I failed to implement what they were saying in the paper. I agree that that's the most likely thing. But I was careful. It's very easy to make mistakes, and I tried to make none, as both someone with over a decade of experience (https://shawnpresser.blogspot.com/) and someone who cares deeply about the things I'm talking about here.

It takes hard work to reproduce the technique the way you're saying. I put all my heart and soul into trying to. And I kept getting dismayed, because people kept trying to convince me of things that either I couldn't verify (because verification is extremely hard, as you well know) or were simply wrong.

So if I sound entitled, I agree. When I got into this job, as an ML researcher, I thought I was entitled to the scientific method. Or anything vaguely resembling "careful, distilled, correct knowledge that I can build on."

2 comments

I think that not being able to reproduce the results claimed in a paper is not specific to ML research. While working as a post-doc at a top university research lab, i spent years trying to understand how it can be that some software that was supposed to corresponds to the well cited paper did not even come close to reproducing the results of the said paper, and that the primary author went on to become a prof at a top university in the US. In short, scientific fraud is also quite common, in most academic papers.
Thank you!!

i spent years trying to understand how it can be that some software that was supposed to corresponds to the well cited paper did not even come close to reproducing the results of the said paper,

This was my exact experience. I didn’t understand why I kept having it, and kept blaming myself for not being careful enough. My code must be wrong, or the data, or something.

Nah. It was the idea.

Kept feeling like a kick in the gut, until here we are today, when I’m warning everyone that Karras, of all people, might publish such a thing.

I really appreciate that you posted this, because I’m so happy I wasn’t alone in the feeling of “what’s going on, here…?”

That seems like a worthwhile thing to publicize in and of itself?

The replication crisis in psychology threw out 50% or so of supposed scientific results.

If this (or just straight fraud) is common elsewhere, it seems like knowing about that would be a good thing for science.

I agree, most top conferences nowadays publish reviews openly, and address this issue I think. Also it is easier said than done, this is so endemic in so many different academic settings, not just in the US, but also in Europe.
I get your frustrations with this state of affairs, but for the reasons I mentioned above, I don't think providing the model and code is a panacea here. Maybe the last few years have also set an unrealistic expectation for the pace of progress. In my (former) field of theoretical neuroscience, if a paper was not reproducible, this knowledge kind of slowly diffused through the community, mostly through informal conversations with people who tried to reproduce or extend a given approach. But this takes several years, not the kind of timescale that modern ML research operates on.

Fwiw I think actual knowledge is there in the ML literature, but it's not in these Benchmark-chasing highly tuned papers. It's more high level stuff, like basic architecture building blocks etc. GANs and Transformers for example. They undeniably work, and the knowledge needed to implement them can probably be conveyed in a few pages maximum. No need for an implementation to be provided by the author, really.

I have no particular expertise here, but I wonder if you've learned to accept a mostly-broken process? We have the Internet, so why settle for slow diffusion over years instead of rapid communication?

Why should graduate students have to spend years trying to reproduce stuff that turns out to be no good? Nobody should have to put up with getting their time wasted like that.

I think it is a social problem, not a technical one. A healthy research field should have some level of cooperation between participants. If you go ahead and publish a "this does not reproduce" paper, you can easily ruin someone's career, so in most cases you don't. I know this is not the platonic ideal of science, but it is the reality, especially of smaller research communities. I agree this is not ideal, but not sure if I would call the process broken though.
This concern over ruining someone’s career itself seems like a symptom of a broken process? Making it safe to openly discuss failures is important.

In at least some big companies in the private sector we have “blameless postmortems” where we describe what went wrong in an operational failure without blaming the participating employees.

Sure having blameless postmortems would be amazing, but I think the informal process I described is probably the closest you will get to it. The reason being that any given subfield can't make this decision in isolation, because the people to who it matters (funding agencies, faculty search committees, journal editors to a degree) are not part of the field, and when they see such a 'blameless postmortem' they will think 'whoa, this person really messed up, we'd better stay away'.

Maybe I am wrong though, and a better culture is possible, like the shift to preprints has happened in a lot of fields and was probably previously unthinkable. So good on you for taking an idealistic stance, I am probably just being grumpy. That being said, whatever culture changes may be beneficial, I stand by my original point that simply dumping code and model alongside the paper is not unambiguously good and may even obscure problems.

To be honest I don't think the StyleGAN papers are benchmark chasing. If you read StyleGAN[0], StyleGAN2[1], StyleGAN2-ADA[2], and this paper there is a clear story. They call out the mistakes in the previous papers and resolve them. The papers themselves even admit to where they fall short. But it is research. Problems don't get solved all at once. But if you pay attention to these 4 papers it is very clear Kerras has a very well defined research focus and direction. He's showing his progress over time, sharing it with the community, and learning from the community as well. This is how research should happen.

[0] https://arxiv.org/abs/1812.04948

[1] https://arxiv.org/abs/1912.04958

[2] https://arxiv.org/abs/2006.06676

Good point, I wanted to make a more general point about the state of ML literature, I am not super familiar with this particular sequence of papers.