Hacker News new | ask | show | jobs
by slowking2 277 days ago
Making invalid states unrepresentable may be a great idea or terrible idea depending on what you are doing. My experience is all in scientific simulation, data analysis, and embedded software in medical devices.

For scientific simulations, I almost always want invalid state to immediately result in a program crash. Invalid state is usually due to a bug. And invalid state is often the type of bug which may invalidate any conclusions you'd want to draw from the simulation.

For data analysis, things are looser. I'll split data up into data which is successfully cleaned to where invalid state is unrepresentable and dirty data which I then inspect manually to see if I am wrong about what is "invalid" or if I'm missing a cleaning step.

I don't write embedded software (although I've written control algorithms to be deployed on it and have been involved in testing that the design and implementation are equivalent), but while you can't exactly make every invalid state unrepresentable you definitely don't punch giant holes in your state machine. A good design has clean state machines, never has an uncovered case, and should pretty much only reach a failure state due to outside physical events or hardware failure. Even then, if possible the software should be able to provide information to intervene to fix certain physical issues. I've seen devices RMA's where the root cause was the FPU failed; when your software detects the sort of error that might be hardware failure, sometimes the best you can do is bail out very carefully. But you want to make these unknown failures be a once per thousands or millions of device years event.

Sean is writing mostly about distributed systems where it sounds like it's not a big deal if certain things are wrong or there's not a single well defined problem being solved. That's very different than the domains I'm used to, so the correct engineering in that situation may more often be to allow invalid state. (EDIT: and it also seems very relevant that there may be multiple live systems updated independently so you can't just force upgrade everything at once. You have to handle more software incompatibilities gracefully.)

2 comments

> For scientific simulations, I almost always want invalid state to immediately result in a program crash.

If you have actually made invalid states unrepresentable, then it is _impossible_ for your program to transition into an invalid state at runtime.

Otherwise, you're just talking about failing fast

> _impossible_ for your program to transition into an invalid state at runtime

Not the case for scientific computing/HPC. Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation - see for instance the CFL condition [1].

The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters (temporal step size, mesh, formula coefficients...). It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.

[1] https://en.m.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E...

In those situations, the functions that contain a singularity should return an Either monad, and in order to bring the resulting data back into the bloodstream of the program, you have to deal with that potential singularity. Unfortunately, scientific computing seems like it's stuck in the age of the dinosaurs with tooling choices and much of the advancements in type systems of the last 40 years are nowhere to be seen. I always found that curious, given its adjacency to academia.
Why? You're just going to unwrap that monad and promptly crash. It's unrecoverable; it doesn't need to go back into "the bloodstream of the program." No amount of static typing will turn this runtime error into a compile-time error, so it doesn't really matter how you express it with types.
> You're just going to unwrap that monad and promptly crash. It's unrecoverable

You handle the error gracefully. It's not "unrecoverable" in the sense of an incorrect memory read that arises from a logic error in the program. It's an anticipatable behavior in a well-defined system of computations that should be treated as such. Simply crashing is extremely sloppy programming in this case, it's not a formal equivalent to discarding an unusable input.

> No amount of static typing will turn this runtime error into a compile-time error

On the contrary. Anything capable of statically checking dependent types can turn any runtime issue into a compile-time issue. Up to and including requiring proof that a function's domain is respected according to all paths that call it as a facet of the type system, and this domain can be inferred by the operations within the function itself.

> Up to and including requiring proof that a function's domain is respected

Does such a proof exist in this context? Or are we writing fanfiction about the problem domain now?

> It's not "unrecoverable"

Yes it is. The correct behaviour in that context is to terminate the program, which makes it unrecoverable.

> Invalid state is usually due to a bug

I don't think the article is referring to that sort of issue, which sounds fundamental to the task at hand (calculations etc). To me it's about making the code flexible with regards to future changes/requirements/adaptions/etc. I guess you could consider Y2K as an example of this issue, because the problem with 6 digit date codes wasn't with their practicality at handling dates in the 80's/90's, but about dates that "spanned" beyond 991231, ie 000101.