| The fact that you had to say "chill" three times indicates that you're trying to convince yourself, not me. None of what you said is responsive to what I wrote. I think it's an opinion piece, but I'm not sure. The issue here is the scientific method. I've listed the things that are required, as I see it. And I've also listed the reasons why I haven't been able to verify it exists here, despite trying for two years. I'm glad that you like ML hacking, and I like it too. But models aren't a godsend; they're "the most basic, bare-minimum requirements of reproducibility." Your reaction shouldn't be "I'm incredibly grateful you'd be willing to do this." It should be "You're required to do this, because if I can't verify your claims, your claims might be mistaken." To leave it off on a softer note, normally I'd bond with you, ML hacker to ML hacker. Because I love ML, and I love hearing what you've been up to in ML. It's the best job in the world, as far as I'm concerned. (Could any other career give you the opportunity to be a developer advocate for high-performance computing in such an interesting way? https://github.com/google/jax/issues/2108#issuecomment-86623... Definitely looking for more examples of "Github Larping," if you know of any.) If you agree that the scientific method is the reason ML moves forward, all I'm doing here is protecting it. |
The scientific method is being followed here. Code is not needed for the scientific model to be followed. Even data. Literally every other field is able to advance without public code or data (in fact most areas of CS). There's absolutely no reason to believe that they won't release their code. They have a history of doing so. Models and checkpoints are not the bare-minimum for reproducibility. They describe their model enough in the paper. There's enough written in the paper (which is 30 pages) to reproduce the model. Will it be easy? No. But it can be done. And to be clear, I'm saying that the status quo of code being released is a godsend. This is not the norm in literally every other field/subfield. Code helps with reproducibility (and so should be encouraged) but is not required.
If you require someone else's code to reproduce results then you're not convincing me you're a good ML researcher nor programmer.