Hacker News new | ask | show | jobs
by moss2 917 days ago
> Onyx is strictly type-checked. However, the type-inference systems in Onyx usually allow you to omit types.

>

> x := 10

I...

Why is this a feature in every new language? Can't we have a language that is more verbose and explicit, not less? I'd love it if named parameters were mandatory, not optional.

(Named parameters is when you name parameters/arguments you pass on to functions, like you can do in python and groovy: `foobar(arg1: 123, arg2: 'hello')`)

Most of my problems I run into is due to implicit behaviour that no one bothers to explain. In Onyx here, having type-inference means I now how to remember that x := 10 means x will be a signed 32-bit integer instead of, you know, the code that I'm writing remembering it for me.

And I'm just guessing here. Maybe x is a double. Or unsigned since its initial value isn't negative. Or maybe it was signed 32-bit integer but the Onyx developers changed it to 128-bit long long for version 666.0.0. The point is I have to look this stuff up or remember it instead of, you know, it being right there.

I don't even know what you gain by doing this. Less code is less messy but also hides a lot of information from you. Hiding information should be something an IDE does, not the language itself.

Thank you for reading my rant.

13 comments

Well on the other end of the extreme we had Java, which had an awful lot of this:

    HashMap<String, List<int>> hashmap = new HashMap<String, List<int>>();
Latest Java lets you omit the left type annotation (I hear, haven't tried it):

    var hashmap = new HashMap<String, List<int>>();
And in many languages the type parameters on the right can also be inferred --- so long as later code determines them --- so you get something like:

    var hashmap = new HashMap();
Hopefully we can all agree that the first option is needlessly verbose. There's more contention between the last two, I think.

My preference would be to do some type inference, but maintain the property that you can tell the type of every expression without looking outside of it (except perhaps for an immediate enclosing function call). This requires, for example:

- The third option isn't allowed, you need to write the second option instead.

- You must annotate function argument types.

- In Rust, you couldn't write `.collect()`, you'd instead write `.collect::<Vec<_>>()`.

- The `x := 10` example is actually somewhat ambiguous. If the language fixes the type of `10` then `x := 10` is legal. If it's an unspecified type (as is typically the case), you'd have to write the type down.

> The `x := 10` example is actually somewhat ambiguous. If the language fixes the type of `10` then `x := 10` is legal. If it's an unspecified type (as is typically the case), you'd have to write the type down.

For that case I like a type signifier as part of the number literal expression, like this: `x := 10f32` or `x := 10i32`.

I have often used a Java utility class of static inferring generic constructors:

    public static <E> ArrayList<E> newArrayList() { return new ArrayList<E>(); }
    public static <K,T> HashMap<K,T> newHashMap() { return new HashMap<K,T>(); }
So code can look simpler, but just as clear:

    import static GenericConstructors.*;

    ...

    ArrayList<String> names = newArrayList();
    HashMap<String, List<int>> hashmap = newHashMap()
The "x := 10" example is one reason it feels safer to me to declare the variable type, and infer the constructor types, than the other way around.

--

What would resolve this whole issue is standardized editor/IDE visualization support for showing all inferred types, just one toggle button/key away.

Inferred types simplify writing code. But when reading code, why should we have to mentally emulate the language's inference algorithm? It is, by definition, supposed to be automating that for us.

Inferring diamond types has been built into the language for _decades_.

Your static utility functions save only 2 characters, but will add massive confusion for other developers.

  List<String> names = new ArrayList<>();
or even better just use var:

  var names = new ArrayList<String>();

Those static functions are neither simpler nor cleaner. Don’t do this.
> Well on the other end of the extreme we had Java, which had an awful lot of this: HashMap<String, List<int>> hashmap = new HashMap<String, List<int>>();

This is untrue. Inferring diamond types has been built into the language for over a decade.

    Map<String, List<Integer>> map = new HashMap<>();
I don't think you're actually disagreeing, since justinpombrio said Java had a lot of type parameter verbosity, which implies it no longer does.
Would you be willing to use a language that looked verbose in the text files, but you had an IDE that hid the verbose parts from you?

Take your Java example. If the code in the .java file looked like

    HashMap<String, List<int>> hashmap = new HashMap<String, List<int>>();
but IntelliJ showed it to you like

    hashmap = new HashMap();
I see what you mean, but do appreciate the quality of life improvement of such things. In your example, that looks a lot like a native int. It might be something different, but 99.9% of the time it’s going to be an int. I think it’s reasonable to say that unless told otherwise, a thing that looks like an int is an int. Same with `x:=3.14`. That could be a long double, but it’s vastly more likely to be a plain old float. Why not make that the default?

An extra advantage I see is that non-default types stand out. At a glance at the code, `x:=10` is the most common int type. It’s plain, unadorned. It’s the usual, boring thing. Things that are less common stand out: oh, this is unsigned for some reason. Guess I should see why. I like the pattern where unusual things are easier to visually scan for.

But at the end of the day, darn it, compiler, you know very well that I mean 3.14 to be a float. Stop making me say so each time!

I can't remember which language, but there is one where every number is a double and there is no int type.

Is Onyx like that? I don't know. It would take me maybe two minutes to look up. The point is I have to look it up to be certain when that certainty could be part of the language.

The other side of "why do I have to remember this when I could write it" is "why do I have to write it when the compiler already knows it?"

I love complete inferred type systems, using one has completely changed how I think about types and what I expect from a programming language. I feel ill every time I have to use go or typescript. Like I am teaching the compiler how to compile. This is not my job, it's a waste of time and energy, it should already know. Things like rescript and ocaml are where my attention is going for typed languages in the future.

Same reason we don't have to type `(((2int + 2int)int) / 8int)int`. Type inference already exists in any ALGOL-like. Modern languages just tend to remove the arbitrary restriction that variables are the places where types must be added.
In C, there's no type inference. Integer literals have the type "int" by default, and they need a suffix to be unsigned or long. They get truncated and promoted implicitly for convenience's sake.

  /* 'a' : int
   * 'a' is truncated to char (self-evidently safe here(
   */
  char c = 'a';
  /* 1 : int
   * c : char
   * c is promoted to int
   * 1 + c : int
   * 1 + c is truncated to char
   */
  char d = c + 1;
  /* d : char
   * c : char
   * d and c are promoted to int
   * d - c : int
   * d - c is truncated to char
   */
  char e = d - c;
If you want to avoid this behavior, you have to use the type suffixes explicitly, pretty much like that "(((2int + 2int)int) / 8int)int" expression you're making fun of.

  // Zero. 32-bit integer left shifted by 32 bits is always 0.
  1 << 32;
  // 2^32. "U" makes it unsigned.
  // "LL" makes it "long long", 64-bits on Windows and Linux.
  1ULL << 32;
  // -2147483648 (signed 32-bit)
  // i.e. 0x10000000
  1 << 31;
  // 2147483648 (unsigned 32-bit)
  // i.e. 0x10000000
  1U << 31;
I don't think this is a good thing. It's very confusing.

I prefer the type inference approach, e.g. in Rust, where they're i32 if the type cannot be inferred and the literal has no type suffix. And I like that no two integers can be used by the same binary operator unless they have the same type, so you need to explicitly cast them to the right type.

The integer literals might be misleading from the real point here, which is that every node in an expression tree in C has a type. The compiler infers most of these types. It infers that (1 / 2) is an int and that (1 / 2.0) is a double. That is type inference, and it's exactly the same as the sort of type inference that figures out what "auto" means in C++.
> that (1 / 2.0) is a double.

That is not type inference. 1 has the type int. 2.0 has the type double. Then, 1 gets converted to double. Then the whole expression has the type double. This isn't type inference; this is like saying JavaScript has type inference because it deduces that the expression ('4' - true) has the "number" type (i.e. double precision floating point).

Compare with Haskell, where a numeric literal like 32 has the type (Num a) => a, i.e., it's polymorphic, and the type is actually inferred based on the context it's used in (it could be Int, Integer, Double, Rational, whatever). If you ask it the type at a REPL, it just tells you "32 :: Num a => a", whereas C would tell you that 32 has the type int (if there were a REPL for C).

> 1 has the type int. 2.0 has the type double. Then, 1 gets converted to double. Then the whole expression has the type double. This isn't type inference.

This is just a difference in terminology. To me, what you’re describing is exactly how a type inference algorithm works. This is also the traditional academic definition of type inference, but it sounds like you’re just using it to mean “inference of the types of variables” (which makes sense as the programmer-facing definition, to be fair, because basically all languages have type inference in other places)

> This is like saying JavaScript has type inference

JS doesn’t have type inference because it’s not statically typed, but Typescript does, and it works exactly how you described it.

The difference is, with compiled languages, the compiler needs to know the type of every expression and subexpression ahead of time, to know what code to emit.

I guess you're right. I think of type inference as "there is no pre-set type for a literal; the type is is inferred based on the context it's used in." But that isn't the definition of type inference. The definition is just that its type can be deduced at compile-time without explicit annotation. But that means that, as long as you don't have to do explicit casting for every expression (e.g. "(5i32 + 6i32) as i32"), all expressions are type-inferred no matter what programming language you're in.

Like, in Rust, the type of a literal depends on the context of the variable it's later used as.

  // Due to usage below first line,
  // 5 is retroactively reanalyzed as if it were 5u8
  let a = 5;
  let b: u8 = a + 1;
Whereas in C, literals always have a set type, and the only reason you can use, e.g., int literals in non-integer expressions is due to the great amount of implicit type conversions in C.

C++, which has a form of type inference, works differently: the type of a variable is always the same as the type of expression initializing it. The closest equivalent in C++:

  // "a" is inferred to be an "int" because 5 is always of type "int" 
  auto a = 5;
  uint8_t b = a + 1; // Implicit truncation from "int" to "uint8_t"
There true equivalent of an integer literal 5 from C (on 64-bit Linux) in Rust would be 5i32, because it's always the same sign, type, and size in every expression. There's never any doubt about the type of an expression or a literal, and hence no need for some type inference algorithm, only implicit conversions. Hence, the equivalent in Rust of the above C++ is this (depending on the platform):

  let a = 5i32;
  let b: u8 = (a + 1i32) as u8;
Well, technically

    int(int(int(2) +::<int> int(2)) /::<int> int(8))
We must disambiguate integer- and float-point arithmetic operators for sure.
> Same reason we don't have to type `(((2int + 2int)int) / 8int)int`. Type inference already exists in any ALGOL-like.

That's not why you don't have to do that.

Many (most statically typed?) Algol-likes have strict types for literals (e.g., 2 is always a signed int, if you if you want specifically unsigned you might say 2u and if you want a double you say 2.0, and if you want a single-precision float you say 2.0f, or something) and strict rules at how math between them works and what types it produces, this has been true since long before tyoe inference became common, and is why you don't have to say 2int+2int — 2 is syntactically defined as int.

There is no inference

It is still type inference though. If 2 is an int, the compiler perform type inference to figure out that the expression 2 + 2 is also an int. It is just that traditional languages only use type inference for expressions, not declarations.
It's type propagation. Which can be seen as a form of inference or not, depending just where you draw the line.
So is auto, decltype and templates. In C++ we properly call it type deduction to distinguish it from actual H-M style inference which C++ lacks. The details of how it is called doesn't detract from parent's argument.
Yes. I hadn't meant my comment as argument on either side, just added context.
What's the type of the binary + operator in C?
Reasonably, x in this case would be generic/polymorphic (just as 10 is a generic/polymorphic constant) with its type bounded by the equivalent of Num trait. Haskell (and to some extent Go) gets it right.

Also reasonably, each source file should start with declaring the version of a language it was written in, so that e.g. Onyx 666.0 changing its default integer width wouldn't affect the code in the file with "///OnyxVer=665.0" at the top of it. Now that is a feature I would like every new language to have.

I think it’s fine if and only if there are 1st party tools for editors to provide hints about the type the language will infer. If I can mouse over and see the type it thinks the var is, and see static analysis errors if I try to treat it as the wrong type without casting first, then it’s fine.

That is, as long as the core tools for the language provide that info somewhere that’s not more than barely more hidden than explicitly typing it out, I think that’s Ok.

> Why is this a feature in every new language?

Because superfluous type checker incantations are, aside from breaking up flow of thought when writing code, annoying visual noise when reading it.

> I'd love it if named parameters were mandatory, not optional.

Named parameters are a different thing than type incantations, and I agree that it would be good for any non-operator function that takes more than 1 argument to require named parameters.

>Because superfluous type checker incantations are, aside from breaking up flow of thought when writing code, annoying visual noise when reading it.

I disagree. Implicit typing for the most part makes things harder read. The less I have guess about what type something is the better. I never use stuff like "var" or "auto", unless forced to for this very reason.

I don't see the value of, say, dot_product(vector1=x, vector2=y) compared to dot_product(x, y).
> I don't see the value of, say, dot_product(vector1=x, vector2=y) compared to dot_product(x, y).

I would include functions which implement (especially unary/binary) mathematical operators where the operands either are interchangeable or have a clear conventional order, in languages where you cannot make them actual operators, within the exception for operators.

2 parameters are usually fine, but then I have seen quite a few bugs with atan2(y,x).
Programming is not unlike writing on an human language. Sometimes you use a proper noun, sometimes a pronoun sometimes you omit subject or object completely, even in rigorous technical writing. There is no hard rule, learning to communicate unambiguously and efficiently it is an art.
But that would only help you if you assign the numeric literal directly to a variable. If you use it in an expression like `foo(10)` or `10 * bar` you would still not see the numeric type specified.

So if you want the type to be always explicit, the type specifier should be coupled with the literal like `10[i32]` or similar, so you can write `foo(10[i32])`.

I might not be against that...

Assigning variables or calling methods (when you discard the return value) would look the same as is typical in many languages:

    int x = foo(); // foo() returns int but we can see that in what type x is
    foo(); // we don't care what foo() returns
Named variables in method calls would have to have the type so as to not hide what type a method-as-parameter returns:

    CoolObject cool_obj = bar(int arg1: foo())
Imagine if you had to write english sentences annotating every noun and verb "(Bob: Person) (went: Verb) to the (beach: Place)." That's what mandatory type annotation feels like to me.
> Can't we have a language that is more verbose and explicit, not less?

What's wrong with the old verbose languages?

No no, I with you this, I hate that Go allows this as well.

If you have a type system, then force the user to specify exactly what type they want something to be. It's completely reasonable in my mind to write var x float32 = 10, the compiler can then deal with adding the .0 if it wants.

Eh, rust has pretty solid type inference (sounds like you'd hate it), but with the rust analyzer extension, you can get inlay hints that tell you the inferred type of every variable, so you're never actually left wondering. Seems like the best of both worlds.
I'd prefer the other way around. If the language was verbose but its tools (IDEs, analyzer extension) hid redundant info from you.