Hacker News new | ask | show | jobs
by agentultra 2898 days ago
I'd add a section on using TLA+ as a design tool. Diagrams and rules of thumb are useful but they don't catch errors or help you discover the correct architecture. See the Amazon paper [0] on their use of TLA+ in designing (and trouble-shooting) services.

[0] https://lamport.azurewebsites.net/tla/formal-methods-amazon....

3 comments

Why is an AWS paper on azure websites? :)
Redundancy.

Real answer though it's a Microsoftian's (that's not a word) website https://lamport.azurewebsites.net/

I feel TLA+ would be too much to ask in a system interview which is what this site is about.

In case anybody is interested, there is a nice talk by Hillel Wayne on youtube (https://www.youtube.com/watch?v=_9B__0S21y8) that provides a high-level overview on what TLA+ is about.

    I feel TLA+ would be too much to ask
I think so too but I suggest putting it in an article like this because I think TLA+ will have wider industry adoption. If candidates know it they could be better equipped to ask more interesting questions of their interviewers even if the position doesn't require knowledge of TLA+. And as the industry does adopt such practices it would be great to be prepared!
It might be too much to ask as an interviewer, but it wouldn’t be too much to offer as a candidate. Sit there submissively and only give me what I ask and you might impress me with you skills. Be proactive and get me engaged over something I haven’t seen before or that I wouldn’t expect you to know and you have a shot at wowing me with your potential.
Could you please talk about your experiences with TLA+? The "tools of thinking" for designing and verifying systems really interest me.
I didn't get any real interest in my office, but I applied TLA+ to a debugging challenge. I described a well-specified system (abbreviated, for instance data was just "DATA", not the actual myriad number of message types that could be sent). The intention was to understand a bug and possibly identify which of three suspected locations it was occurring in. I had enough of the specification for each part implemented:

Two sender/receivers (sending messages back and forth) and a data bus. For some reason, we seemed to be getting bad data (but not consistently). My "debugging" was actually more like "bugging". I took a correct TLA+ spec, and weakened constraints on different parts until I recreated the behavior we were seeing (it was the data bus). But the nice thing was being able to show that the particular bug couldn't happen from the sender/receivers. Their interaction with the data bus were correct (per the specification) and they were the only parts I could directly inspect (as I didn't have the files on how they implemented the data bus itself).

Once I found the right constraints on the data bus to weaken, I recreated the errors we were seeing in the model itself. This led to devising several more test programs that could more reliably produce the error. From there we were able to better communicate the problem with the contractors involved and get things corrected ("proof" that we weren't the problem).

A particularly nice thing was being able to model abstract versions of the system. I didn't need the fine details (what message specifically is being sent, didn't matter). But I also found I needed more details than my first pass and was able to refine the data bus specification (in TLA+) as needed to provide the necessary level of detail and extend it to add new capabilities.

That's awesome! I'm adding this write up to my collection of case studies. You should consider doing a blog post with a mockup problem really similar to illustrate that process with code and specs.
I think I lost the files but I will try to put something together.
Any suggested resources/books for getting a better understanding of TLA+?
I read through a chunk of Lamport’s book and the first 7 or so sections of his online course (the rest weren’t posted yet).

https://lamport.azurewebsites.net/tla/tla.html

https://learntla.com/introduction/

Is also a good introduction. That was enough to make me dangerous. Since work didn’t have an interest in training me, and I had other obligations, that’s all I’ve done so far.

hwayne, who did learntla.com, is getting close to finishing a book on it. You could start with his online tutorial but get the book soon as it's out. Here's his Twitter in case you want to watch for the book release:

https://mobile.twitter.com/Hillelogram

I could fill a blog post about it but in my current project we're using TLA+ for two things:

1. Helping us design features whose requirements are vague. The more hand-waving required to explain a particular feature the more likely we are to use TLA+ to model our assumptions and verify our understanding. This has led us to ask some interesting questions of our design team to help us build a better feature.

2. Requirements that are really hard that we need to ensure are implemented correctly. We use TLA+ to ensure the properties and invariants are correct with respect to the requirements and validate our model. This is really helpful in the case of concurrency and consistency. For our application we're using event-sourced data and it's imperative that our event store is consistent in the face of concurrent writers, can be replayed in a deterministic and consistent order, and that our assumptions will hold between versions of the events.

Ah - interesting! I would not have expected 1. I will dig into TLA+ and try to understand it better.

Thanks! Please do consider writing a blogpost, I'm sure many here would appreciate it.