I wish people would stop with these shallow, boring dismissals. It's a research paper, not a product. While it's disappointing that they don't release more, it's good that they share their ideas.
A research paper by itself isn't worth nothing, sure, but without the ability to reproduce the paper or even check their results it's ... not worth much.
The architecture is described enough to re-implement it and train it on known datasets/benchmarks such as VQA2. A single man with a medical degree named Phil Wang ('lucidrains')[0] has the ability to reproduce most of these papers by himself. He has 246 GitHub repos[1], most of which appear to be reproductions of models which are only described in papers that had no associated code or models released, such as [2]. Often it appears he releases code within 2 weeks of a paper's publication on ArXiv.
Because the authors don’t get a large reward for open sourcing the work and they stand to lose future value by lowering the gate to competition. You may want the code, but Google will not care (or it might dislike it).
Look at GPT-3+, OpenAI gets fame and fortune while people struggle to reproduce their last-gen models.
Probably because the research code is not as nice as one rewritten from scratch anyways, and it's using internal data sets / APIs.
They just want to get onto the next research instead of taking time to publish a clean open-source implementation, which can (and will) be done by somebody else anyways.
Researchers always have a lot to take away from Google papers that don’t release code or dataset - people understand that sometimes folks at companies sometimes cannot release the code or dataset. Doesn’t make any key contributions less meaningful - if that was the case, all conferences would have banned papers that don’t release code and dataset by now.
Conferences _should_ ban papers that don't release code or other means of reliable reproduction. The only reason they don't is because "research" in ML has more or less been a joke compared to any other established scientific field. And I'm not going to give Google the benefit of the doubt. At the very least I'll treat them like any random stranger publishing a paper. But in reality I treat their papers with a heavy critical eye these days because more often than not their research has turned out to be bunk and unreproducible.
Funny you compare ML with other fields - in my experience ML is the most open and reproducible area of scientific research, by far. Talk to researchers in other areas, many write “dataset and code available by request” but never share it, have custom CFS solvers and write papers with it but never release the code, and do experiments and leave out all details in the paper making it impossible to reproduce.
You are free to treat a paper from Google like a paper from any random stranger, sure. But it doesn’t change the fact that many ML researchers I know in my R1 University (and many other top universities) always mention how much insights they get from these papers even when they couldn’t always release the code/models.
I think a large part of the innovation in ML research is precisely because the code is release. The prevalence of a github.io page with the code, the paper, slides and a video presentation is amazing. I would love to see this practice extended for every other paper in every domain.
I think these are both important perspectives. There’s a real societal cost to having some of the best AI researchers with a golden gag only able to share limited details about their work the main bulk of which remains behind closed doors. If we don’t impose social costs to this behavior it will become even more common.