Hacker News new | ask | show | jobs
by chimi 2442 days ago
The problem is, it's impossible to know if they are accurate without doing everything the computer was coded to do for you.
2 comments

This is not entirely true. When I implement a simulation, I specifically look for specific properties of the simulated system that can be checked, for example time evolution of total energy or (angular) momentum. If you have a decent set of properties with non-linear relationships between them, it is actually quite hard to have a misbehaving simulation that still produces corrrect values for these properties. In fact, these checks have led me to the discovery of bugs that would otherwise have been impossible to find because the sim output was just plausible enough.
This really rings true for me. I've also found that looking for several "known" properties within a complex model's behavior to be an effective way of rooting out subtle bugs. If the model isn't keying in on the obvious, it hardly has a chance of keying in on subtle unknown relationships. I've even gone so far as to optimize hyperparameters based off context specific properties, e.g., hunting for the model that has the most coherent output behavior with regard to a range of inputs (assuming that outputs are expected to have continuous behavior).

https://sproutling.ai/blog/harvest-simulations?jm

https://sproutling.ai/blog/growth-simulations?jm

Well, the premise stated above is that whatever testing you, the "author", judge sufficient doesn't matter; accuracy comes only at the hands of the sacred code review!
I dare you to review dense code for numerical computations and actually spot bugs. This is really hard! Unit tests are actually much more reliable but they are limited to deterministic algorithms and models that have reasonable complexity, that is, it is viable to compute expected results by alternate means.
Code review is more about determining if you have the correct test cases to cover the algorithm, and a solid architecture for maintenance, than it is about algorithmic correctness.

Code review is a tool to push back on your manager ignoring testing: “Steve requested I add X tests.”

That's not necessarily true. We could start building correctness or equivalence proofs for the building blocks of research software, and maybe some day we could prove some meaningful equivalence between how the software is described, and how it actually works.