A few cs conferences have artifact evals. Most all research in cs doesn't actually have any sort of code review at all. No field is implementing the thing you are expecting.
I think it will be an uphill battle no doubt, but I think the only alternative would be to share the whole dataset and have reviewers re-implement the analysis to confirm the results. That would also be a huge improvement, but it seems like a much bigger burden on reviewers.
Well theoretically it already is, given you normally have multiple authors and reviewers. It's just done poorly, just as a code review can be done poorly.