Hacker News new | ask | show | jobs
by platz 3376 days ago
I continue to be believe even as a static typing fan that static types are fundamentally incompatible with OTP and it's goals.

Distributed systems just seem to too thorny for static types to subjugate/bend to their will.

Sure, you can declare global invariants ahead of time that your cluster must uphold, but it's a bit less "distributed" in a real sense then

8 comments

You have to model sending a message across the cluster as marshaling into a binary form and unmarshaling it again. I don't mean that you "should" model it that way... you have to model it that way, because that's what is happening. Therefore, when receiving a message, you really only ever get a Maybe Message or Either Message Error or whatever you want to model it as. The act of marshaling the message back into the local representation is also when you check it for whether it conforms to the type restrictions you think it should have.

Because you must already model this as a process that can fail, I don't think it does break the static typing model at all. In fact I routinely "statically type" messages coming from things that were actually emitted by dynamic languages!

What gets tricky is if you try to model this as a process that can't fail. But the problem there isn't static typing, it's a specific instance of the general principle that you can not build robust systems based on the principle that networks can't fail.

I also think this is an instance of the general misunderstanding about static types, which I understand deeply because I once held it, that static types somehow prevent errors. They don't. What they do is provide a gateway that says "in order to get into this type, you must meet these criteria, and the compiler is going to statically check that you've verified these criteria". A static typing system doesn't force things through that gateway, it forces you to check whether things fit through that gateway, and do something with the things that don't. Then, it also allows you to strictly declare that everything that uses that type is statically checked to be "behind" that gateway, so there are no other ways around it to get in, thus creating a space in which you can count on the fact that the values have been checked for certain properties and you can now write code that counts on those without constantly checking them. A statically typed system faced with the task of, say, parsing a number out of a string, does not prevent a user from sending me a string of "xyz"; it just prevents me from just sending it through the system as-is.

> everything that uses that type is statically checked to be "behind" that gateway

In a distributed system, the largest the "gateway" can reliable be is a single node, because you don't get guarantees about the code that other nodes in the system are running. Even the single node case poses difficulties, because I believe in OTP the upgrade path means you have to transfer state during upgrades. What if the types of the state during the upgrade don't exactly match? Can multiple types of a thing exist simultaneously? How is these types versioned? etc... it gets complicated.

> Therefore, when receiving a message, you really only ever get a Maybe Message or Either Message Error or whatever you want to model it as.

Sure, you can receive messages as "Object" and then cast/parse them inside the node. Does that mesh with the vision of what people have when they want to bring static typing to erlang?

---

The hard part about thinking about OTP is not just the message passing, but also the myriad deployment & upgrade & versioning scenarios.

I am a fan of static typing over dynamic typing in everything else , i.e. normal programs.. just not _OTP-style_ erlang for distributed systems.

Even thinking about something like a gen_server (http://erlang.org/doc/man/gen_server.html) makes my head hurt... though if someone can figure out a way to do it that's faithful, more power to them.

> Can multiple types of a thing exist simultaneously? How is these types versioned? etc... it gets complicated.

I don't use Erlang, but I have developed an Actor system for C# [1] which is based on its (and Akka's) concepts. Clearly without a static type-checker for the whole distributed system we have to manually get involved and patch the old and new so that we can hot swap processes. Versioning I've found is best done by maintaining the old process that accepts the old message format, maps it to the new one, and then forwards it on to the new process that accepts the new message format. Any other node that is lagging behind will continue to work, and any new one will send to the new address for the process.

This isn't really rocket science, and if you stick to a few basic rules it tends to work out just fine. That doesn't mean that type safety goes out of the window, it just means that in creating a distributed process you must accept that you can't retire the old contract without it causing potential problems.

Apologies if I'm missing your point about OTP, but ultimately it seems that at some point (as the GP says) you are marshalling a message into a text or binary format, and then unmarshalling. At that point if the unmarshalled static type doesn't match the type that the process expects, then it will be off to the dead-letter queue. I don't really see how that's any different to giving the wrong type to a function in a dynamic language, or using an incorrectly typed variable that is picked up by a compiler in a statically typed language. In each case it's type checking at the earliest possible opportunity.

[1] https://github.com/louthy/echo-process

'Sure, you can receive messages as "Object" and then cast/parse them inside the node. Does that mesh with the vision of what people have when they want to bring static typing to erlang?'

No, that's not how you do it. You marshal things directly into the desired types. Check out either aeson for Haskell or how Go does things via either the json modules or the generic Text/Binary Marshaler/Unmarshaler.

"but also the myriad deployment & upgrade & versioning scenarios."

The answer to all of those things is mostly that even a lot of Erlang shops don't use live upgrading. You really have to have a very particular use case for that to be the best solution vs. a rolling upgrade and server restarts. Even if the language is capable of it, it still requires you to write services that can handle being upgraded, and it's much easier to write services that can handle being restarted, especially since you 100% have to write that anyhow because services get restarted anyhow. Most people don't have that use case. Web services certainly don't have that use case.

Once you drop that, it's a lot simpler.

"Even thinking about something like a gen_server"

gen_server is partially as complicated as it is as a side-effect of other decisions in the language. While the concept of a gen_server is a strength in Erlang, the specific implementation of gen_server as this "behavior" thing is mind-blowingly complicated for what you actually get. (It reminds me of Python's "metaclasses". I spent many hours wrapping my head around what that was, but in the end, all that it amounts to is what is now called a class decorator, which is way more sensible. A metaclass isn't a class decorator in theory, but in practice, class decorators are way easier to understand and cover 99.9% of the use cases, if not 100%.) When I implemented supervisor trees in Go, my solution for gen_server/gen_fsm/gen_* was just to... not. Behaviors are just a very, very weird half-object-ish system with a lot of limitations. They are easily replaced by simply having some sort of "interface" system, be it via conventional classes or interfaces. It's why you don't see "behaviors" as Erlang defines them anywhere else. Erlang has a lot to learn from and copy from, but that part isn't it.

After using hot code loading in production for the last 5 years, I don't see why you wouldn't use it, when it's right there. Maybe it's less thought to do a rolling restart, but it's a lot more effort expended by everything in the system to rebuild all the state that was in your processes.

A behavior is simply a list of functions you've declared that your module will export -- and a convention on what they might do. gen_server.erl is going to make lots of callbacks into your code, and rather than pass a huge list of funs, instead we pass the module name, and gen_server calls the exported functions from that module (this style means all the callbacks will hit your new code if you hot load, without you doing anything special; processing type changes is up to you, of course)

I know how aeson works, but the details of how it parses text into a HashMap that you can extract fields from into a data structure is somewhat besides the point, but I'll grant you the point that there are static solutions for message passing, sure.

It seems odd to me that they would include a unique feature like live-updating if it shouldn't be used.

I grant that live-updates and gen_servers may be anti-patterns, but my assumption was to consider the effects of static types on OTP and these are part of it.

If you identify some subset of erlang+OTP that is easier in some ways, great, I'm all for it.

I am just pointing out some complexities without making assumptions about what should be included or discarded. ( I do not know what erlang shops do in the small or in the large).

Perhaps what we want then is static types for "OTP-Lite"

I think even live-updating could be statically typed. Basically the live-update is a collection of functions that map every data type in the old process into the corresponding type in the new process. In the dynamically-typed case, these functions are just the identity. In the statically-typed case, if the new type has a new attribute, your mapping function has to define a reasonable default value. If you can't do that, your dynamic live-update would have gone badly anyway.
The main problem with pushing a typechecked live-upgrade in one shot is that you'll need to put a big lock around the distributed system (A non-upgraded node messaging an upgraded one would be fine, because the upgraded one knows the conversion function, but what happens in reverse scenario?)

It could be done without a big lock by splitting into three steps:

1) Push an upgrade that changes the types and adds the conversion functions. The valid type is the union of the old type and the new type. Wait until all nodes complete the upgrade.

2) Push an upgrade that instructs the nodes to convert their data and start using the new types by default. Wait until all nodes complete the upgrade.

3) Push an upgrade that removes the old types and conversion functions.

> [...] etc... it gets complicated

Which is exactly why we want to employ static types: in order to catch the difficulties in implementing it correctly. We describe the complications in the type system, through a model that captures them, to allow compiler errors -- rather than runtime errors -- to guide us in implementing it correctly.

Types only hinder getting an invalid program to compile -- which is exactly what we want.

In general, sure - but this post is about erlang/OTP, and the way you're speaking in generalities makes me think that you're just trying to persuade me about and champion the value of static types in general.

To digress slightly, consider an example from another domain, although I would rather keep this discussion about erlang. Now, Haskell is the only well-known language that has lazy (non-strict) semantics. Over the years many folks have proposed to make Haskell strict by default, alleviating some of the headaches that occur from non-strict evaluation. However appealing that may be, it would be a sad day if that occured, because we'd loose the only language to understand how lazy/non-strict evaluation affects how we design programs while there are countless strict languages, and lazy/non-strict evaluation has some very nice properties indeed.

Now to bring this back to erlang/OTP... sure, it is very nice when we add static types to erlang because we get all the nice things that static types provide, but we also loose some things. There are some features in erlang/OTP that are very dynamic, and forcing a static type system simply kills those features. I think that would be a sad day for the erlang, because you'd loose the ability to design distributed systems utilizing the full range of behaviors what the erlang/OTP system offers. There are already other actor systems in the world that offer static typing. You don't need erlang to build those systems—There is only one erlang/OTP that some some very unique features that none of other have.

Say, if we're talking about javascript, which runs at the level of a program on a single machine, I say bring on the types. If we have some other statically-typed actor system that works well for certain use cases, great. If we're talking about erlang/OTP, which is designed specially for fully distributed systems, I say let it be.

Would you mind elaborating, or sharing some papers on the subject? I'm particularly interested in a dialect of ReasonML that would use the BuckleScript compiler + ConcurrentML but target the BEAM VM, and I'd love to know how bad of an idea it might be. Maybe because it lacks e.g. session types it's hopeless, but I'm not sure.

So, would love to hear specifics!

There are parts of OTP patterns that seem inherently dynamic. Message passing is only one aspect. There are also deployment/upgrade concerns with a running system.

Actors can receive messages that change their behavior entirely ( http://erlang.org/doc/man/gen_server.html ). Features like this are not there by accident.

Actors can hot-upgrade code their dynamically while the process is running. For example, if an actor is hot-upgrading I'm not sure how it would work, if the types of the old state and the new state don't exactly match. Sure, you could write functions to do this, but you see the picture is much more complicated.

I don't think I've presented the best arguments off the top of my head here here, but if you think more about the deployment/upgrade scenarios, along with partial updates along in certain nodes of the system, you can think about how complex it could get.

Basically, never assume that you get to take the whole cluster down to do an upgrade. Comprehensive "red/black" deployment strategies used by other non-distributed languages are not really the OTP way of doing deployment/upgrades.

I have version N of a struct and then having version N+1 of a struct in-flight at the same time is almost impossible with current statically typed languages. In a dynamically typed language, as long as the contents of version N+k struct are additive and don't change the semantics, old code can read new data.

What needs to happen is both, immutable code, and versioned structs with pure functions that can upgrade and possibly downgrade structs as needed. The larger the distributed system, the versions of a struct (message) will be in-flight at a time. Services need to contain no state, so that they can be micro-rebooted and brought up with the new version.

Joe Armstrong had a comment on globally accessible but immutable code, which I think would go a long way towards the ability to statically type the inputs to a function in a distributed system. Interposition and routing would be the only way to upgrade or deprecate old code paths.

Well the first question you have to answer is simple : what is the type of self(), your own pid. This is a really hard problem and what stopped SPJ in the late 90s/early 2000s

Then the second problem is that at any given time you can receive a message from another node/process 10 years in the future compared to you, that you know nothing about his code or types. How do you type check it?

Finally, the actor model in general allows unbounded nondeterminism. This is not really something you can build into a static type checker.

The "easy" solution is to make messages an opaque black box that can be anything... but at that point you are leaking static typechecking everywhere.

> Distributed systems just seem to too thorny for static types to subjugate/bend to their will.

The more I've learned to leverage types, the more I realize that it's my limited knowledge of type systems that prevents me from expressing something in it. Types do not bend to the will of programs; programs bend to the will of types (in statically typed languages).

> Sure, you can declare global invariants ahead of time that your cluster must uphold, but it's a bit less "distributed" in a real sense then

I don't understand. The components of distributed systems communicate via protocols. What prevents the implementation of these protocols from leveraging type safety, thus transforming a runtime error into a compile-time one?

Static typing is about catching programmer mistakes, by communicating your intent to a compiler -- "I expect the type of this to be a Maybe Int, fail if that's not the case". There's no essential difference between a test informing you that a value-level property doesn't hold up at runtime, and a type error, informing you that a type-level property doesn't hold up at compile-time.

> What prevents the implementation of these protocols from leveraging type safety

Global invariants of a running distributed system are different than local invariants in a single program that you can stop, deploy re-compiled binaries to, and then start again.

Now, you can use static types in actor systems, and they are some of these that exist. These typed actor systems don't do all the same things that erlang/OTP does (that may be ok - maybe you don't need them). If your use case fits into what the typed actor systems actor systems provide, by all means, one of those are probably a better fit for you.

I suggest you look into Session Typing, specifically Multiparty Session Types as these provide a refreshing approach to the problem of communicating threads. A lot of it is still kind of experimental but there's some good traction being made for sure and it's probably as expressive as you'd need to get to model 95% of the type information in an Erlang program. Type inference obviously isn't a choice yet, but I think a good language offering some of these features on the BEAM VM is all that is needed to make them hit the mainstream and actually get used for real software so that more work can go into the theory, etc. The problem is being solved on the bleeding edge of things, just not as fast as Erlang itself is progressing.
While I agree that the OTP perhaps is not as easily statically typed, since it was built with Erlang in mind, I do think that static typing adds a layer of robustness to distributed systems, especially if you design it that way upfront. In my experience the problem comes when you try to apply static typing to a dynamic system.
> In my experience the problem comes when you try to apply static typing to a dynamic system.

As in, because it compiles down to a dynamic system, it's no good?

There are plenty of languages that give us strong static guarantees and compile down to dynamic or untyped languages. Look at Purescript, Elm, etc. They all do quite well compiling down to JS.

Don't forget that assembly isn't strongly typed either, and most languages compile down to that. I don't see anything wrong with a static typed layer that compiles to dynamic code, the interface you're providing is still type safe.

In regards to Purescript/Typescript, they're both statically typed and that results in friction when trying to integrate with the existing JavaScript ecosystem/libraries. Erlang/OTP might be different, but there will probably be situations where the type system is either incompatible with a certain library, or the type system is made less strict (e.g. an any type).
That wasn't what I got out of the previous post, it seemed to be saying there was something inherently unsafe about compiling down to a dynamic language.
I tend to agree with you. Message passing and static types don't mesh unless there is some type of contract between the sender and receiver. It would be a nightmare.
How does dynamic typing help if there is no contract between sender and receiver? Even in that case they​ must agree on the content of message.

Even if you insist in keeping the message untyped, with a static type system one could always convert (and possibly reject) messages as soon as they are received into a more precise type. That would keep the code that the compiler can't verify to the edges of the system.

Distributed systems are not like traditional programs, because there is not just one "edge of the system".

Every node becomes an "edge" in it's own right, and doesn't necessarily have global coherence with the rest of the system.

True, but if the sender wants the receiver to do something of value then it will need to meet a contract that the receiver enforces. That doesn't require a central repository of contracts, one node can diverge, but you must understand that parts of your network of services will start to fail. From that point of view it starts to look very much like the linker phase of a compilation, and that the types need to match up to the data structures being instantiated. It's just this 'linking' phase is in the programmers heads, and not particularly useful. A distributed system that can validate itself is a much more valuable concept.
Absolutely, the rubber meets the road at some point, nodes must understand/assume "contracts" about the data they are working with.

There already are static typed actor systems ( e.g. Orleans) which work well, but my point is that I believe OTP is more flexible for better or worse. Whether that flexibility is worth it to you for what you get is another matter.

Also I'm not sure how to think about binary compatibility between upgrades in such a system

> There already are static typed actor systems (e.g. Orleans)

Yep, I develop one myself. And have gone to the extent of not allowing senders to even post a message if it's of the wrong type (processes in nodes publish the types they accept to a central store). I initially went along with the 'accept anything' approach (which Akka really majors on too), but found that for the large systems I was developing that it became a real headache to deal with.

> but my point is that I believe OTP is more flexible for better or worse. Whether that flexibility is worth it to you for what you get is another matter.

Yep, fair enough, if it works for you, who am I to complain? It's not worth it for me, because I feel quite strongly that the code I write should understand the types it's working with. It feels like this super-late binding can give false positives, appear to work, when in fact it's not. That scares the shit out of me when systems get large.

Think of it this way: you have a struct type with 4 attributes that you want to pass to another function.

Currently, that function declares it will match on the pattern of those 4 attributes rather than a static type. Now, you update the Node and modify the type on the sending Node to have 5 attributes.

With pattern matching on the 4, everything still works. With static types on the struct the contract is now out of sync.

I find this to be a really poor argument. Essentially you're lucky if your systems continue to work as others go off changing message formats without consideration for the code that will receive it?

On a suitably complex/large system this is a recipe for disaster. Things start to slowly rot. It is far better to maintain the old function, accepting the old struct, map it to the new struct and forward it on to the new function that accepts the new struct. Let the old one consume anything that's already queued, or being sent from other nodes that haven't yet been upgraded whilst the new one takes the new format.

False. In erlang, your message passing is "at most once".

If you send a bad message, the receiver will crash or discard it and it is how it is intended to be.

Erlang embrace laws of maths and physics.

I would say there should always be a contract between the sender and receiver, whether that's using static types or otherwise. Not having a contract is a nightmare.

For example, say a satellite sends a number to the throttle control in feet/second, but the throttle control thinks its in m/s. To each of those systems, they're just passing a number and don't know any better.

Every JSON API call currently works without a contract. In theory it should have one, but in reality it doesn't unless the server (hopefully) validates. Either can change at any time without informing each other.

WSDL based APIs on the other hand have clearly defined contracts at both ends but there's more overhead involved.

It's not an explicit contract, but there must be an implicit one for stuff not to break
I also agree with this premise. I favor the gradual typing philosophy more and more. For me at least, productivity wise, being able to write something, play around with it, make changes, etc without worrying to much about satisfying type requirements is great. When the idea and and implementation feels solid go back and gradually add in type requirements.

I would love to see Erlang get a LLVM based JIT compiler backend. I think this http://llvm.org/devmtg/2014-04/PDFs/Talks/drejhammar.pdf is the latest work done in that area.

There was a talk at the eef17 this week from the OTP team on it. It is on erlang solutions YouTube channel.

Sadly on phone so can hardly link it

I think it is necessary to go with an erlang like style if one wants to get a remotely acceptable cost model.

I am not sure how much static typing actually hinders and how much that is a matter of tooling, though. Maybe static typing could do things like checking whether the new version will be compatible with other nodes before deploying?