Hacker News new | ask | show | jobs
by tel 4083 days ago
Why do people find that type erasure is a problem? I'm happy, for instance, writing in languages wherein type erasure is the obvious right answer.
3 comments

I don't often find myself needing to get T, but sometimes I do. Consider the case of a Handler<T extends Foo> interface that several classes implement. If I have several of these, for various Foo subtypes, I might want to put them in some sort of collection that I can look into later after receiving a Foo message, so that I can dispatch it appropriately. Why do I have to do this dispatch myself? The types of all the Handlers go away, so I need to hold on to them somehow, and match at runtime.

The clear way to do this is to put them in a Map<Class<T>, Handler<T>>. Now how do these Handler objects declare that they're ONLY able to handle a specific Foo subtype (let's call it FooBar)? It'd be REALLY nice if I could just say "Hey handler, what's your generic type? Is the message I just got an instanceof your generic type?" Java won't let you do this due to erasure.

Okay we can get around this. Each Handler<T> has to declare a method (say getType()) that returns the Class<T>. Since I'm generically declaring my class FooBarHandler<FooBar>, I can protect myself from returning a BazBat in this method, but there's NO way for me to abstract this method away. Each Handler has to declare a "public Class<T> getType()" that returns T.class. But since T isn't a thing past compile time, I have to repeat this same method implementation for each concrete type. Gross. This has the added irritation of forcing all the parent classes, if they implement this interface, to be abstract, since they can't implement the method appropriately. This isn't necessarily a bad thing, but the limitation is annoying.

In an ideal world, I can declare some ParentHandler<T> that has this method, and have all my handlers just extend it, with no duplication.

> It'd be REALLY nice if I could just say "Hey handler, what's your generic type? Is the message I just got an instanceof your generic type?

No, it wouldn't be nice, it would be unsafe. If one day you end up adding a new type in your container, you need to update your runtime check as well or your code will fail in mysterious ways.

Erasure keeps you honest by asking you to think carefull about the types so that they can be checked by the compiler, and once the compiler has done this verification, you are guaranteed that your code will work.

Any language feature that encourages the use of reflection , such as reification of types, should not be supported by a language that wants to claim to be sound.

It's not necessarily unsafe, but it is very difficult to do safely. The design suggested is certainly unsafe, however---there's no way to ensure that the values don't lie about their self-reported type and so using that to trigger a coercion is liable to explore things.

If the compiler provides a couple things:

    data TypeRepr
    typeReprEq :: TypeRepr -> TypeRepr -> Bool
    
    typeOf :: Typeable a => a -> TypeRepr
such that TypeRepr cannot be generated (e.g. faked) by users, typeOf is guaranteed to be genuine, and (this is the hardest) such a thing doesn't violate parametricity then you can use that interface to write

    safeCoerce :: (Typeable a, Typeable b) => a -> Maybe b
which is guaranteed to only allow the coercion when `a` and `b` actually happen to be the same type.
Ah, gotcha. This "project out of a polymorphically typed heterogenous container" problem shows up in Haskell from time to time, too. Generally, people just learn to do without it even to the point of considering it a bad practice to try. I'm not going to claim that this is the right way for Java... but it's interesting to note that this is a pain point that's perceived as a problem with type erasure in the Java world. In the Haskell world it's considered to be a good design principle enforced by the "obvious" step to erase types.
It's not necessarily that people regularly see it as a problem, it's just that sometimes you get yourself put into a type corner and the lack of it causes you to have to repeat yourself. I'm not sure what the alternative world would look like, but I feel like this means the type system simply isn't expressive enough to cover this particular problem.

Upon thinking about it a bit, this problem seems to be functionally equivalent to pattern matching on the type of the message (which Java also lacks). I'm not a Haskell guy, but the immediate solution I'm seeing is to just have separate cases for each one. This is still an inferior solution in this particular case, because it forces you to modify the code in two places; luckily the type system helps here and makes sure you do.

But consider a problem of a different form:

Imagine I have an object Bar<S , T> and I want to have a frobnicate(S s) and a frobnicate(T t) method. Since Java erases, I can't do this. frobnicateS and frobnicateT it is! This is particularly annoying because you know S and T are different, but you can't express it to the type system! This seems like it'd be solvable by some sort of disjoint union, but again, Java lacks this. Fun fact, it does have a way to bind a generic on multiple classes: Bar<T extends Baz & Bat>. It would be natural to add a | for this sort of operation, if they ever figured out how to do this sort of operation.

You can take advantage of lambda structural typing to work around the frobnicate overload issue http://benjiweber.co.uk/blog/2015/02/20/work-around-java-sam...

I don't think the same erasure problem is actually a consequence of type erasure - it is just defined in the spec.

I think generally the point is that "forgetting to a generic" is a dangerous operation because recovering that forgotten information is risky. It's inconvenient, but the alternatives are worse. Reflection, especially universally, blows up the amount you can trust your types very significantly.

Your Frobnicate example is interesting since you want an extra level of polymorphism in there, but getting this can be bad for inference (see MultiParamTypeClasses without functional dependencies). In any case, it's not clear why you ought to expect it to work.

> It'd be REALLY nice if I could just say "Hey handler, what's your generic type?

There are better ways of resolving type arguments, such as on Handler implementations, at runtime.

See https://github.com/jhalterman/typetools

Right. And now you're pushing what should be a compile-time error to runtime, and slowing down performance with runtime checks to boot.
Indeed, but it allows library authors the ability to write nicer APIs without having to pass Class references around - they can be resolved.
One common case that I've encountered:

There's no (good) way of going "this is a Comparable<Foo> and a Comparable<Bar>, but not a Comparable<Baz>" that's detectable at runtime. (Also true with other interfaces.) Especially for trait interfaces.

There are also nasty cases where you end up crashing randomly somewhere else in your code at runtime because somewhere something passed a List<Foo> and it expected a List<Bar>, but somewhere in between the type information got lost.

Ditto, I find myself wishing to be able to do instanceof T / T[] (that's an array of T) annoyingly often.

Look at the mess that is EnumMap for an illustration of why type erasure can be frustrating. The amount of reflection done at runtime for what should be (and is, if you write a non-generic implementation, or if you're using a language that's sane) purely compile-time work is... frustrating. (And while you're at it, the fact that you can even attempt to pass in a non-enum into EnumMap. I thought Java was supposed to try to push such errors into compile time?)

> There's no (good) way of going "this is a Comparable<Foo> and a Comparable<Bar>, but not a Comparable<Baz>" that's detectable at run

It's a shame that various libraries/frameworks have resorted to building APIs around Type Tokens - where users have to create anonymous classes - to create something that effectively reifies some generic type information. It's a lame solution, but a solution.

It's a dangerous solution as well given that it relies on user land extensible code to be correct in order to ensure type safety. Generally this is true whenever you have coercion, but type tokens done wrong give an (invalid!) excuse to coerce.
People, including those who gave us Java's generics, have always argued that erasure is a problem. It wasn't something that was desired, but rather a necessary evil for easier backwards compatibility.
As I understand, it turned out to be also useful for more than backwards compatibility (though that was entirely the reason for it) -- its why you could have Scala which its richer type system and a good interop story with Java, whereas the attempt to provide the same thing on .NET faltered in large part because of the .NET platforms reification of generics.
It was never about "backwards compatibility". It was always entirely possible to introduce generics and still have both the compiler and the runtime perfectly backwards compatible. C# did this going from version 1 to 2 of the CLR. You can still run any bytecode compiled for version 1 on the version 4 of the runtime to this day.

No, the issue was a much more esoteric one (and an invented and self-inflicted constraint by the designers themselves) of migration compatibility. Consider entity A using compiled libraries (no source available) from vendor R and from vendor S in the same product. Vendor R releases a new version of his library and uses the fancy new generic collections - and refuses to maintain a non-generic version. Vendor S releases a new version of his library but refuses to provide a version that use generic collections. Library S calls code in library R. All of these need to be in place to require migration compatibility.

> its why you could have Scala which its richer type system and a good interop story with Java, whereas the attempt to provide the same thing on .NET faltered in large part because of the .NET platforms reification of generics.

Do you have a source for this? In my view it is the opposite: Reification is the right solution as it makes reified types part of the first-class system with no strange corner cases. It is type erasure that treats realized generic types as second-class citizens, with strange constraints bubbling up through the entire system.

> Do you have a source for this?

There's a lot of second-hand discussion of this readily locatable on Google, but I haven't saved and can't readily find the posts from people actually involved in implementing Scala on .NET that I saw years ago that directly pointed to this as a problem.