Hacker News new | ask | show | jobs
by skybrian 408 days ago
While I generally agree with defining new types to assert that validation has been done, I think your blog post could have explained more about what kinds of validation are practical to do. For example:

> Address that represents a “street address” that has been validated by your street address to exist

What does it even mean to verify that a street address exists? Verifying real-world relationships is complex and error-prone. I’m reminded of the genre of articles starting with “Falsehoods programmers believe about names” [1].

In practice, the rules that can be enforced are often imposed by the system itself. It simply doesn’t accept data that isn’t in correct format or isn’t consistent with other data it already has. And we need to be cautious about what harm might be caused by these rejections.

Having the validation logic in one place will certainly help when fixing mistakes, but then what do you do about data that’s already been accepted, but is no longer “valid?” This sort of thing makes long-running systems like databases hard to maintain.

[1] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

2 comments

> I think your blog post could have explained more about what kinds of validation are practical to do

Perhaps following the two links with the word "valid" in them to will answer your concerns: https://jerf.org/iri/post/2023/value_validity/

Note that article does explicitly have the sentence "Let’s forget the Umptydozen Falsehoods Programmers Believe About Addresses for the sake of argument and stipulate a perfect such function." These are examples. Trying to write "here's how to validate everything that could ever happen and all also here's a breakdown of all the falsehoods and also here's how it interacts with all your other logic" is not exactly a blog post so much as a book series. It turns out that even if you completely ignore the Umptydozen falsehoods of all the various kinds, you still have plenty of validation problems to talk about!

However, the in-a-nutshell answer to "how do you handle time invalidating things" is that you treat your database as an untrusted store and validate things as needed. I'm actually an 80/20 guy on using databases to maintain integrity for much this reason; I love me some foreign keys and I use them extensively but the truth is that that is only a partial solution to the data validity problem no matter how clever you get, and temporal inconsistency is a big one. Once you have any source of inconsistencies or errors in your DB, a whole whackload of need for validation and care basically comes dropping in all at once, or, to put it another way, if you're not 100.00000% successful at maintaining data integrity, the next practical option is 95%. There is no practical in-between, because even that .001% will end up driving you every bit as crazy as 5% being wrong in most ways.

But that's also out-of-scope for blog posts targeted at people who are only doing ad-hoc validation whenever they are forced to. Learn how to validate properly at all, then get better when you have a time-based problem.

Good article. Yeah, I wouldn’t expect a full explanation, just some kind of “here be dragons” caveat. Perhaps a hyperlink alone is a bit too subtle since readers aren’t always going to dereference it. (And there’s some irony there, given the subject of the linked article.)

The types in Go’s template/html package are a pretty interesting example of using types tactically to indicate validity. The HTML type is used to turn off HTML escaping when it’s already been done. It’s using a type as a loophole. It’s still very useful to have a type like that when reviewing code for security bugs, because you know where to look for problems. Unsafe sections in Rust serve a similar purpose.

Types are about creating trust, and this trust is often short-lived. When data crosses a security boundary, the validation has to done again.

Far too many programmers forget that time passes.
Yeah, the issue is that proofs are harder than people think, even for trivial things (try a few easy leetcode problems in Lean with a proof of correctness), and less useful than people think, especially when you get past low level algorithms and into domain logic (your point exactly). They also don't serialize well, so a database or API call with a "proof" field would be susceptible to fudging, or the definition could change between deployments. They're also easy to make incompatible: one library requires bounds proofs for signed ints, but your app uses unsigned ints, so you have to either rewrite your app or rewrite the library, or cast, in which your type system has to handle like a "checked exception" and propagate that possibility throughout your whole type domain and app logic.

I'm pretty convinced there's a good reason that while "propositions as types" is cool in theory, it's unlikely we'll ever see it catch on in practice.

> try a few easy leetcode problems in Lean with a proof of correctness

This might actually be harder than, say, proving some undergraduate math theorems because reasoning about mutable state (especially when you want it to be performant) is tricky. I might guess that it could be easier to use a model checker such as TLA+ for that (although that can only verify algorithm descriptions and not implementations) because they seem to be more built with these things in my mind, but I lack enough experience with it to be certain.

> …especially when you get past low level algorithms and into domain logic (your point exactly). They also don't serialize well, so a database or API call with a "proof" field would be susceptible to fudging…

Nonsense.

Proving things about low-level algorithms that are full of complicated behaviors involving shared mutable state is often more difficult than proving things about high-level domain logic; regardless, people still do it, and if you use any modern CPU, especially from the ARM family, you benefit from their work.

A proof serializes just as well as any other kind of program — after all, the first step in checking a proof involves deserializing the proof, or in other words, reading and parsing its source code.

As for “fudging”; proofs are guaranteed to be correct up to the correctness of the proof checker, which can be used to recheck the proof at any time.

For more information, see Geuvers (2009) “Proof assistants: History, ideas and future”.

https://sci-hub.st/https://link.springer.com/article/10.1007...

> Proving things about low-level algorithms [...] more difficult than proving things about high-level domain logic

I'm not sure I buy that. Like, prove that you never double-charge a sale, or a deactivated user can never make a purchase. Even if all that logic is within one service, it could be separated by multiple layers, callback functions and redirection, parallel tasks, database calls scattered between, etc. And even once you do have a proof, a single line code change anywhere between here and there could easily force you to redo the whole thing. And that doesn't even get into the more common case of distributed services.

> A proof serializes

We're probably talking about different things. I'm imagining something like a proof of the optimal traveling salesman between a bunch of cities. (Note, optimal here; verifying that a solution is below a specified limit can be done in polynomial time, but verifying optimality cannot). Say you want to store the answer in a database for later lookup. But it'd be possible that the DB could get fudged, and so "proof"-wise couldn't be trusted. Thus anything requiring a type-level proof would have to solve it from scratch every time. That's what I mean by "they don't serialize well" (though some proofs, like "N is not prime" can be serialized by storing the factors). Of course you could work around it and add "assuming our database has not been fudged" to the axioms, but the second you manually update one record in your database, the house of cards comes tumbling down and you can no longer trust that any of the proofs that you've worked so hard to build still apply in your system.