| Some thoughts.
1/ I think that it's not always possible to modify the domain. For example, I could have a function that takes a name of a file as parameter and returns a CanBeWritten object. Now, I could have a function that open a file in write mode and take an object of this type as parameter. The issue is that between the moment I acquire this object and the moment I use it, the file could, you in fact, become non-writeable.
(There was a post on hn about this idea of using the type system like this https://news.ycombinator.com/item?id=35053118 ). I think you focus a lot on software issues and neglect the hardware ones. But it's a choice. Still my thoughts (but at this point you already understood that it was going to be like that the entire post): I think that when a fault is detected (when it becomes a failure if I follow your definitions), an attempt to fix the problem and return to a normal state can actually fail - by incorrectly fixing the issue. Like: you have three times the same integer (redundancy) and one of them have a bit flipped.
You decide that the one different from the two other is the incorrect one.
You detected a problem, you tried to fix it. But it could be the case that two bitflips occured at the same position. There is no definitive solution to that, but documenting all the detected problems AND the fixes applied to them would help. And for the error messages ... Well, my position is that most of the time they are useless for the end user. They can be useful for the developer. For the end user, the best error message (if such a message is required) is something unique enough to be copy-pasteable on Google to find a solution that the user will not understand but will be able to apply. I used to consider (when I started computer science) that an algorithm is like going from point A to point B on a city map. There is essentially one "good" path and a huge quantity of "wrong" paths were you can get lost. And by trying to find your way, you can make the situation even worse. |
1 - Yes, when it comes to things that touch the hardware or the OS it's hard to encode these things at the type system level since they can change from under you. This is a great example where it is useful to handle some faults at the type level (i.e., file might be missing, remember to check) while handling others as failures (file got read-only out of nowhere... better abort what I was doing).
2 - Yup, trying to fix errors often makes it worse, which is why simply restarting is often the best way to go :)