Hacker News new | ask | show | jobs
by sillysaurusx 1824 days ago
The fact that you had to say "chill" three times indicates that you're trying to convince yourself, not me.

None of what you said is responsive to what I wrote. I think it's an opinion piece, but I'm not sure.

The issue here is the scientific method. I've listed the things that are required, as I see it. And I've also listed the reasons why I haven't been able to verify it exists here, despite trying for two years.

I'm glad that you like ML hacking, and I like it too. But models aren't a godsend; they're "the most basic, bare-minimum requirements of reproducibility."

Your reaction shouldn't be "I'm incredibly grateful you'd be willing to do this." It should be "You're required to do this, because if I can't verify your claims, your claims might be mistaken."

To leave it off on a softer note, normally I'd bond with you, ML hacker to ML hacker. Because I love ML, and I love hearing what you've been up to in ML. It's the best job in the world, as far as I'm concerned. (Could any other career give you the opportunity to be a developer advocate for high-performance computing in such an interesting way? https://github.com/google/jax/issues/2108#issuecomment-86623... Definitely looking for more examples of "Github Larping," if you know of any.)

If you agree that the scientific method is the reason ML moves forward, all I'm doing here is protecting it.

2 comments

I say chill because you're freaking out.

The scientific method is being followed here. Code is not needed for the scientific model to be followed. Even data. Literally every other field is able to advance without public code or data (in fact most areas of CS). There's absolutely no reason to believe that they won't release their code. They have a history of doing so. Models and checkpoints are not the bare-minimum for reproducibility. They describe their model enough in the paper. There's enough written in the paper (which is 30 pages) to reproduce the model. Will it be easy? No. But it can be done. And to be clear, I'm saying that the status quo of code being released is a godsend. This is not the norm in literally every other field/subfield. Code helps with reproducibility (and so should be encouraged) but is not required.

If you require someone else's code to reproduce results then you're not convincing me you're a good ML researcher nor programmer.

I'd be no scientist if I didn't update my priors.

I retract my claims. You're right. Thanks for calling me out.

I will say that it's... a gargantuan effort to do the things that you're proposing. But as someone who did them you're right, you can. (BigGAN-Deep took a year to track down the bug https://github.com/google/compare_gan/issues/54)

BigGAN-Deep is a decent example of the thing I was really worried about: replication. I thought it'd be really easy to "just implement the paper." But no one had. Mooch did, but not at the same scale as the DeepMind release.

Maybe you're right about me, too. You're convincing me that I'm not a very good ML programmer. It's probably best to bow out on whatever high notes I've achieved.

Karras' work is fantastic. I don't know why this preview of things to come was where I chose to do this. Thank you, nVidia group, for working so hard.

Hey man, I respect that. I also understand your frustration. Reproduction is difficult. I'm going through it right now with a paper that has no code attached. You bet I'm pulling out my hair. I just think taking your frustration out on this paper is not the right vector. Please continue to call out papers that aren't reproducible. Please continue to push forward higher standards. But also recognize where we are and where we've come from. And most importantly, pick your battles. The passion is right, and I agree with the spirit of what you wrote, just not the direction.

And I'm not trying to say you suck. But you said you've been studying the subject for only 2 years. So I am going to check you. It's easy to grow an ego, but it often isn't useful. Sucking at something is the first step to being somewhat good at something. And you're clearly past the step of "sucking" but not to the step of "wizard." I don't know where you are between there tbh. But I do understand the frustration haha. That is normal.

Side note: usually it is good practice to note that you edited comments. It was rather confusing to look back and see something different.

> If you require someone else's code to reproduce results then you're not convincing me you're a good ML researcher nor programmer.

I call bullshit. In computer science, not releasing the code of an algorithm whose output you describe is akin to maliciously obfuscating your methods. No serious paper should be accepted without a script to reproduce the exact same results again.

> In computer science, not releasing the code of an algorithm whose output you describe is akin to maliciously obfuscating your methods.

Well tell that to my advisor (it's also something I've done in the past). So my experience doesn't reflect your claim.

> No serious paper should be accepted without a script to reproduce the exact same results again.

You do realize that this is a pre-print, right? If it went to NeurlIPS then they did release the code to them and will release the code to the public later.

The repetition here is a common rhetorical device, not necessarily an indication of self-doubt.

That said, I agree with your overall position on ML publications. So much of what we see is a tech demo protected by some kind of moat, either a private commercial dataset or insatiable processing requirements or missing code or a combination of the above. These aren’t science, they’re advertisements.