Hacker News new | ask | show | jobs
by lacampbell 3351 days ago
Why the fixation on pattern matching? Don't get me wrong, I enjoy the benefit it provides, but for OO code multiple dispatch is a more elegant and idiomatic way to solve the "I don't want to implement the visitor pattern" problem.

FYI, I recently found out that C# can actually do multiple dispatch

https://blogs.msdn.microsoft.com/shawnhar/2011/04/05/visitor...

7 comments

Multiple dispatch is sometimes more elegant, provided you can divorce the method being dispatched to from the argument classes. If you need to do it with a visitor pattern and multiple callbacks, it's much worse. If you need completeness checking, it's worse. If you've got a simple concise algorithm best expressed in a loop, it's worse. If you've got a nested structure that you want to partially destructure, it's worse.

As you've found, overloading is statically typed multiple dispatch; discovering the right overloaded method at runtime is functionally equivalent to multiple dispatch if your overload resolution rules prefer more specific derived types (rather than simply giving up with ambiguity).

would you be able to give me a pseudo code example of things that are easy to do with pattern matching, but not multiple dispatch? I'm intrigued, I always considered them equivalent.
Here's some concrete examples in Rust:

http://pzol.github.io/getting_rusty/posts/20140417_destructu...

It's not just about finding the right bit of code to run; it's also about pulling some fields out of the polymorphic value so they're available as nice short names, without qualification.

I suppose languages from the ML family, and rust, haskell etc are somewhat oriented around nested structures of algebraic types that you want to destructure to pull out little atoms of data out of from some well-defined node. But that's orthogonal to the OO approach, where once all the little atoms of data are sealed up in their protective casing of an object you don't want to deal with them anymore.

C# is a weird beast though - it's pure OO on some level, but then keeps making getters and setters even easier to write. It's easier to write an "object" where every field has a getter and setter, than it is to write an actual constructor for private variables, so that's what people do.

To me it seems pattern matching without possibility to actually enforce that all cases are matched seems to miss the most useful scenario:

What I want to do is create proper sum types. For some reason most OO languages were designed around the idea that being "open for extension" was a good thing. Normally if I make a base class, I want a closed and known set of sub classes. E.g. if I make a payment method then I want to make a Credit or Invoice kind where one requires Credit Card details and the other requires an address. In F# that's a simple sum type. In C# making a sum type is a convoluted mess of making an outer base class and private nested sub classes.

Here a sum type with the two cases plus an exhaustive pattern matching switch would have solved in 10 lines what takes me 300 lines to do in C#. I think the concept of "fixed" or "closed" hierarchies is foreign enough to OO that it probably needs to be a completely separate concept from the normal types. F# shows that it can be implemented as sugar on top of the regular CLR type system. All that's needed is that some keyword such as "enum class" is converted into a sealed hierarchy of plain record classes with no methods on them. The switch statement we already have can then treat these "enum classes" specially by checking that all cases are matched.

A sketch solution for C# could look like

    enum class PaymentMethod
    {
        CreditCard(CrediCardDetails cc),
        Invoice(Address billingAddress),          
    }
    
    // later
    switch (paymentMethod)
    {
      case CreditCard(ccdetails):
         Console.WriteLine($"Creditcard expiring {ccdetails.ExpiryDate}");
      case Invoice(addr):
         Console.WriteLine($"Bill to {addr.Name}");
    }
Writing the above in current C# would require significantly more boilerplate. This isn't the best pattern to use in all situations: but in many cases when the types are merely data carriers it doesn't make sense to put the logic in the class as OO prescribes, so the FP recipe works much better.
In the C# example in the code I posted, all cases are matched. If you add a new animal subclass, that's not "explicitly" handled, it will route to the default (Animal, Animal) method.

It's equivalent of

    _ -> default_function
At the end of a pattern matching statement.

Essentially it's impossible for not all the cases to be matched, as far as I can see.

Yes but making new classes map to default is what a default clause already does (which is basically what a base case in an abstract base class does with normal inheritance).

What I want is to make a new type added mean the code won't compile until it is explicitly handled in all pattern matches that do not have a default clause. Basically what I'm saying is that this:

   bool HandlePayment(PaymentMethod pm) {
      switch(pm) {
         case Invoice(addr):
              SendInvoice(addr); 
              return true;
         case CreditCard(ccdetails)
              return PrcessCreditcard(ccdetails);
         default:
              // What?
      }
   }
Would be a lot better if it didn't require the default, and the compiler could detect the error that occurs when someone adds a new case (BitCoin) to the types of payment method. Inheritance and abstract baseclass for the processing means that you'd do this

   bool HandlePayment(PaymentMethod pm)
   {
      return pm.Process(order);  // sends an invoice, charges credit card etc
   }

   and you simply have 

   abstract class PaymentMethod { abstract bool Process(order); }
   class CreditCard : PaymentMethod {  ... }
   class Invoice : PaymentMethod {  ... }
And this is of course the regular way of writing OO, but I find it unnatural and cumbersome in many cases. It's hard to define concise descriptions of what types are actually available, and you have to write a ton of boilerplate to ensure a closed hierarchy and complete matching in all scenarios where you can't enclose the logic inside each type.
Scala gets this right, in my opinion, with sealed traits and case classes. "enum classes" feel like a much more limited option by comparison. Although I'd take either over the status quo to be sure.
What's the difference between the enum class I outlined and scalas case classes? I don't know Scala but reading the docs on its case classes, their description seem to fit exactly what I tried to describe (immutable types, closed for further inheritance etc).

The example they give for notification = email | sms | voice looks a lot like what I was trying to sketch with the paymentMethod example.

http://docs.scala-lang.org/tutorials/tour/case-classes.html

I didn't intend for my proposal to be more limited than than the Scala case classes at least :). I want exactly that. An FP closed type hierarchy with enforced pattern matching.

Case classes can have methods on them, for one - I can see how that might be added to yours, but it'd probably be very awkward syntactically if you wanted different methods on different cases. To me, it also feels like your enum class is more of a special case. Scala's case classes and sealed traits work together as independent features that come together to provide value. I've used case classes before without using the 'enum-like' property of sealed traits, where I just want to get equality, unapplying, etc... for free.

They also have more configurability - while a case class provides default implementations for things like equality, you can override those (this could be a pro or a con depending on whether you prefer flexibiltiy or performance, I guess - although arguably the flexibility should only have cost if you use it, assuming it's implemented well).

It's definitely not a huge difference, and arguably Scala can suffer from it's philosophy of providing tools that can do something rather than tools for something that means the language often comes across as huge and as having too much stuff in it. I like it, but it's not the C# way right now for sure.

Ah - yes method impl on the cases is certainly a difference. I'd probably choose compatibility with F# sum types if this lands on the CLR though, but I can see how equality and some other things might be useful. Otherwise you are forced to pattern match for all logic which might end up like an inside out object.
I think matching + destructuring solve similar but different problems. Often times, items you don't need to do something based on some unbounded set of things nor do you need an abstraction.

For example, if I am writing a logic system in rust I can either match with one single arm or a nested match implement simple rules like double negation. I don't see how you could easily do this with multiple dispatch like the one you link.

I find myself really wish more languages would implement both the system you link and something like rust's/ML's. Usually it's easier to build multi-dispatch than match syntax though. In rust you can use HashMaps of TypeId -> Box<Any>. I would do a similar thing in C# (though now I am going to use this shiny new feature).

> multiple dispatch

> "I don't want to implement the visitor pattern"

Both multiple dispatch and visitor bring type unsafe dynamism and possible runtime errors. Pattern matching and algebraic types fit well into static type system and are much more safe.

Why does multiple dispatch bring type unsafe dynamism or runtime errors? Or do you just mean C#'s implementation of multiple dispatch?
Yep, there is no way to check the exhaustiveness in the compile time. Say you have an Expression parent class and a list of child classes: say Add, Const, Var, and a corresponding visitor. Add new child class, and the visitor would be non-exhaustive. Multiple dispatch is not very different in that sense.

I think that Alan Key spoke about dynamic nature of OOP, that's one of the examples. Even in the statically typed environment OOP demonstrates its dynamic nature.

I'm lost here. Either people didn't read the article and are just relying on preconceived notions about what stuff can and can't happen with multiple dispatch, or I am missing something blindingly obvious. Anyway, consider this example:

    class Expression {}
    class Add : Expression {}
    class Const : Expression {}
    class Var : Expression {}

    void FSpecialization(Expression e1, Expression e2) { ... }
    void FSpecialization(Const c, Var v) { ... }

    ...

    void F(Expression e1, Expression e2)
    {
        FSpecialization(e1 as dynamic, e2 as dynamic);
    }
How is this not exhausted? Any new sub class of expression will be routed through to the first FSpecialization. Same as it would with a

    _ -> default_f
in algebraic pattern matching.
It's not exhaustive, because you're only dealing with the Const+Var case. You ignore the other 5 cases. Having a default case doesn't suddenly make it exhaustive, it's just an explicit acknowledgement that you can't.

Because what happens if you introduce a new expression, but not realize this means the implementation of F should be updated?

Regarding algebraic pattern matching, you can let Haskell check exhaustiveness and then drop the default case. The compiler will then warn you if you forgot one.

To give you an example, in our codebase we have an enum that is used 3068 times in one solution. A very similar one (you would need domain knowledge to understand the difference) is used 2985 times. Both are not defined in that solution by the way.

It's not out of the question they will receive a new enum member down the line. It would be very useful if the compiler was able to tell you a switch that uses either enum is no longer exhaustive.

If you could exhaustively switch on an enum, you could skip the default: case and have the compiler enforce you covered all cases. (I know, the C# enum==int would make that impossible.)

>How is this not exhausted?

If you would remove FSpecialization(Expression e1, Expression e2) a.k.a wildcard pattern, compiler would not warn you about your dispatching is not exhaustive. And if you'd add another type of expression, all your dispatchers and visitors without wildcard would become non exhaustive, staying valid code from a compiler perspective at the same time.

I'm sorry would you mind going into a bit more depth about that example? It's not obvious to me why the compiler can't figure out the the visitor is no longer exhaustive.
Visitor is just a collection of functions for all type's subtypes. The problem is that visitor knows nothing about class' subtypes, just like the base class knows nothing about it's children, so you have to add visit cases manually for each child class. The only way to detect that visitor is non exhaustive is to apply it to the child class it has no visit function for.
I think both are not exhaustive (or don't have exhaustiveness enforced by the compiler).
> FYI, I recently found out that C# can actually do multiple dispatch

But also good to know, that it can be costly and unsafe. `dynamic` is basically using reflection at run-time, which for most cases is fine, but it's something to keep in mind. It also means that type checking is delayed until run-time instead of compile-time.

A dynamic type is probably just implemented as a tagged bit of data, surely. Which is the exact same way an algebraic datatype would be done. So I dunno why it would be any slower.

I think the safety thing is a non issue as well - what arguments could you pass in to the statically typed function that would cause the use of dynamic to bite you in the ass?

> A dynamic type is probably just implemented as a tagged bit of data, surely

A dynamic "type" is really just `object` whose methods are looked up at run time. It's not tagged data at all. See [1] for more info, specifically the example(s) at the bottom, as the reflection code is what gets executed at run (albeit with caching so that methods aren't looked up all the time).

`dynamic` is a complex feature that enables multiple dispatch as a side-effect, but only because it allows a whole lot more.

[1]: https://visualstudiomagazine.com/Articles/2011/02/01/Underst...

> what arguments could you pass in to the statically typed function that would cause the use of dynamic to bite you in the ass

    public void Output(int value);
    public void Output(Person person);
    ...
    dynamic aValue = "Some String";
    Output(aValue);
Compiles, because the actual resolving of which overload to call is done at runtime, not at compile time. And at runtime, there is no overload that accepts a string.
I am talking about what value you could pass into the multiple dispatch code in the article I linked to. I know you can cause run time type errors using dynamic in general, but in the code I linked it was very well contained and water tight.

Dynamically typed languages usually represent objects as something like a struct with an int tag to represent the type, and a void pointer for its value. The actual reflection going on in the code I posted for multiple dispatch would surely be nothing more than an int comparison - same as what Ocaml probably does.

> I am talking about what value you could pass into the multiple dispatch code in the article I linked to

Ah, I misunderstood the question. Yes, in that case it's fairly type-safe.

> The actual reflection going on in the code I posted for multiple dispatch would surely be nothing more than an int comparison

Is this based on gut reaction or are you talking about optimizations that `dynamic` performs? I ask because my understanding is that `dynamic` is like a really efficient reflection-emitter, that attempts to do at runtime the same thing that the compiler would do at compile time, but slower because it has to look stuff up via reflection.

From Eric Lippert [2]:

> The magic is: the compiler emits code that starts the C# compiler again at runtime. The runtime version of the compiler analyzes the call as though the compile-time types of all the objects had been their actual runtime types, generates an expression tree representing that call, compiles the expression tree, caches the delegate for next time, and runs the delegate.

[2]: http://stackoverflow.com/questions/10330805/c-sharp-multiple...

So maybe `dynamic` at runtime figures out that it can do a "a struct with an int tag to represent the type, and a void pointer for its value", but I wouldn't assume it does and from what I've read on it, I wouldn't think that it does.

Base classes already have to carry their runtime type with them anyway. As do nodes of an algebraic data type. You may well be right that the compiler might not be "sufficiently smart" to see this very limited use of dynamic and realise no special reflection magic is actually required. But considering dynamic is only used to actually pull out the runtime type in the example code, and not to call arbitrary methods on it that may or not be there, I assumed - maybe wrongly - that the runtime hit would be the same as the branching that occurs in Ocaml or Haskell or F# when it comes across a "match" statement.

But to answer your question - yes, entirely based on gut reaction and what a compiler would ideally be able to do given the code posted. So very probably wrong.

> I think the safety thing is a non issue as well - what arguments could you pass in to the statically typed function that would cause the use of dynamic to bite you in the ass?

None if the function is written correctly, but by that logic you don't need type safety at all. The point is that the dynamic technique gives you no warning if you forget to implement one of the cases.

My point was that the dynamic technique in the posted code was written in a very small, contained point where it would be impossible due to the signature of the surrounding argument for it to go wrong.

    void React(Animal me, Animal other) 
    { 
        ReactSpecialization(me as dynamic, other as dynamic); 
    }
Unless you pass in a casted value here, it's fool proof.
The point is that if you forget to write one of the ReactSpecialization methods nothing detects it. Perhaps more importantly, if you add a new type of Animal there's nothing to tell you you need to write a bunch of new ReactSpecializations.
Aha! There's the crux of an argument. It's a valid point actually, that ties into the expression problem.

As an OO apologist I do feel compelled to point out that

    type Animal = Cat | Dog | Mouse

    let react = function (a, b)
    | Cat, Dog -> ...

    ...

    | x, y -> $"{x} is not interested in {y}."
Would suffer from the exact same issue though.
No fixation. It can just be more convenient sometimes (and less convenient in others).
wow. I havent used C# for at least three years, but that is a really powerful feature.
What's even better is that the DLR consists of some pretty tight code. Historically I was using a ConcurrentDictionary containing Funcs compiled using Expression. I've found the DLR is just as fast (for things that you can express with it), with a much lower overhead. I've started using it in dynamic scenarios wherever possible.

Basically, this is going to be about as fast as you can get with the statically-typed underlying runtime.

> I've found the DLR is just as fast... with a much lower overhead

Can you expand on this? I would not have expected that it would be faster than compiled expressions.

The codegen creates a cache of delegates (created using expression trees), but uses a structure more suited to the problem than a simple ConcurrentDictionary.

[1]: http://geekswithblogs.net/simonc/archive/2012/07/20/inside-t...

Ah, faster than the `ConcurrentDictionary`. That makes sense!
Dynamic calls gets compiled at runtime, so there's an initial hit (as there would be with Expr.Compile) but after that ...