Hacker News new | ask | show | jobs
by chriswarbo 4269 days ago
> What about compilers for dynamic languages which reason about type, like that if the program calls (symbol-name x), it is inferred that x must hold a string, so that a subsequent (cos x) is inconsistent?

That's irrelevant; if the program containing (symbol-name x) and (cos x) works in an interpreter but not in the compiler then the compiler is broken. Note that throwing-exceptions/triggering-errors/etc. is "working", since it's a valid form of control flow which the program could intercept and carry on. Aborting immediately (like a segfault) isn't "working". Triggering undefined behaviour also isn't "working", but since most of these languages lack any real defined behaviour, it's assumed that "whatever the official interpreter does" is correct.

Usually, these compilers completely acknowledge that their behaviour is incorrect WRT the interpreter. They embrace this and define a new, static language which just-so-happens to share a lot of syntax and semantics with that dynamic language. For example, HipHop for PHP could not compile PHP; it could only compile a static language which looks like PHP. Likewise for ShedSkin and Python. PyPy even gives its static language a name: RPython.

Hence, these compilers don't show that runtime tags and static types are the same/similar. Instead, they show that sometimes programmers without static types will use tags to do some things that static types are often used for.

2 comments

> if the program containing (symbol-name x) and (cos x) works in an interpreter but not in the compiler then the compiler is broken.

The compiler isn't broken.

The interpreter throws an exception, which meets your own definition of "works".

The compiler gives an opinion "x has an inconsistent type". In this manner, the code also "works" in the compiler.

You can still run the program if you like; then that opinion becomes an exception if an input case exercises the code, proving the compiler's opinion right.

Compilers for dynamic languages do not preserve all interpreted behaviors. This is usually explicitly rejected as a goal: users must accept certain conventions if they want code to behave the same in all situations.

For instance, some Common Lisp implementations re-expand the same macros during evaluation, which is friendly toward developers when they are modifying macros. But in compilation, macros are expanded once.

Usually these compilation-interpretation discrepancies are obscure; we are not talking about obscure features here. It's a basic tenet of type checking in a compiler that it will flag things that will throw type-related exceptions at run time. You cannot say that type checking compilers for dynamic languages are simply not allowed because type mismatches are well-defined behavior; that's simply outlandish.

> The compiler gives an opinion "x has an inconsistent type".

> You can still run the program if you like; then that opinion becomes an exception if an input case exercises the code, proving the compiler's opinion right.

In this example, the compiler cannot give "x" the static type of "string" and also allow the program to handle errors when "x" is used in places where strings aren't allowed. If the static types don't match, the result is undefined; that's what static typing means.

If the compiler causes an exception to be thrown, then either:

- That's just an implementation-specific quirk, which just-so-happens to be the way this compiler behaves when it hits this undefined case; it's not part of any spec/documentation and useful only to hackers trying to exploit the system.

- We accept that the compiler didn't assign "x" the static type of "string" after all; it actually assigned it "string + exception + ..." (which may be the "any" type of the dynamic language, or some sub-set of it)

The situation isn't one of static typing, but of static type checking. Check (parent (parent (parent yourpost))) where it says

> What about compilers for dynamic languages ...

A compiler for a dynamic language that checks types does not bring about static typing. The language in fact remains dynamic.

What the checking means is that the compiler can give an opinion based on a static view of the program, and we can run that program regardless.

The compiler can say: yes, all expressions in the program can be assigned a type; no, some expressions have a conflicting type, or couldn't be assigned a type.

Static typing indeed means that we use the result of the static analysis to remove all traces of type from the program, and only run it when its type information is complete and free of conflicts.

The dynamic language optimizer can in fact take advantage of its findings to eliminate run-time type checks where it is safe to do so.

> What the checking means is that the compiler can give an opinion based on a static view of the program, and we can run that program regardless.

> The compiler can say: yes, all expressions in the program can be assigned a type; no, some expressions have a conflicting type, or couldn't be assigned a type.

What is a "conflicting type"? In the dynamic language, there is only one static type (any = int + string + float + (any -> any) + array(any) + ...), so there can't be any conflicts.

It's fine to have a compiler try to specialise the types of variables beyond that, eg. narrowing-down the type of "x" in the example to "string + float", based on how it's used (or just leave it as "any").

A "conflict" would imply that a variable is used in ways that don't match the inferred type; but the inference is based on how it's used.

PS: Static type inference, like your example of (symbol-name x) and (cos x), has been going on for at least 40 years, since ML. The article mentions, Ocaml, which is a descendent of ML and hence has full type inference (you can leave out all annotations and the type-checker will figure them out). Depending on the type system, this may or may not be possible. For example types in Agda, Coq and Idris are full programs, so inferring them is undecidable.

Since dynamic languages only have one static type, static type inference is trivial. So is type-safety: there's no way to cast a value of one type to an incompatible type, since there are no other types.

Inferring tags doesn't work in general: when I say `print x` it doesn't mean that `x` must be a string; I might want Python to raise a "TypeError" exception, so that I can jump to the handler where all the real work takes place. Of course that's bad practice, but exceptions are only "errors" by convention; in the same way that False is only a failure by convention.