|
This isn't a criticism - I'm just curious to hear people's thoughts on this. When I look at this code, one of my initial reactions is that it does not seem to be very thoroughly tested. Sure, certain modules have been tested (e.g. `model.quat_affine`) but it's not clear how completely. Meanwhile, other modules, for example `model.folding`, have not been tested at all, despite containing large amounts of complex logic. That kind of code that works with arrays is very easy to get wrong and bugs are difficult to spot. My experience working with code written by researchers is that it frequently contains a large number of bugs, which brings the whole project into question. I've also found that encouraging them to write tests greatly improves the situation. Additionally, when they get the hang of testing they often come to enjoy it, because it gives them a way to work on the code without running the entire pipeline (which is a very slow feedback loop). It also gives them confidence that a change hasn't lead to a subtle bug somewhere. Again, I'm not criticising. I am aware that there are many ways to produce high quality software and Google/DeepMind have a good reputation for their standards around code review, testing etc. I am, however, interested to understand how the team that wrote this think about and ensure accuracy. In general, I hope that testing and code review become a central part of the peer review process for this kind of work. Without it, I don't think we can trust results. We wouldn't accept mathematical proofs that contained errors, so why would we accept programs that are full of bugs? edit: grammar |