| HN Mirror

re-running is definitely too much work for most scientific papers, at least in ML and computational sciences were experiments might take 1000s of core-hours or gpu-hours, but that's usually not necessary. In addition, just running the code can spot really bad problems (it doesn't work) but easily miss subtle ones (it works but only for very specific cases).

I think it's more important for reviewers to read the source, the same way one would read an experimental protocol and supplementary information, mainly checking for discrepancies between what the paper claims is happening and what is actually being done. In the above example, a reviewer reading the code would have spotted that the model isn't there at all, even though it runs fine.