Hacker News new | ask | show | jobs
by jcrites 3715 days ago
The notions of "strong" and "weak" typing have never been particularly well-defined, but I think my usage is in line with their usual meaning: https://en.wikipedia.org/wiki/Strong_and_weak_typing

> Some programming languages make it easy to use a value of one type as if it were a value of another type. This is sometimes described as "weak typing".

Strong typing makes it difficult to use a value of one type as if it were another. In PHP, you can compare the integer value 1 to the string value "1" and the equality test returns boolean true. Conflating integer 1 and string "1" is weak typing. A data format that expresses the concept of the timestamp 1999-12-31T23:14:33.079-08:00 using the same fundamental type as the string "Party like it's 1999!" is what I would call weakly typed.

Ion does not make it easy to use a string as if it were a timestamp or vice versa. It has types like arbitrary precision decimals, or binary blobs, that can't easily be represented in a strongly-typed way in JSON. You can certainly invent a representation, like specifying strings as ISO 8601 for timestamps, or an array of numbers for binary -- actually, wait, how about a base64-encoded string instead? Where there's choice there's ambiguity. These concepts of "type" live in the application layer in JSON, instead of in the data layer like they do in Ion.

Note as well that stronger is my term. The Ion documentation says "richly-typed". Certainly Ion does not include every type in the world. Perhaps a future serialization framework might capture "length" with a unit of "meters", or provide a currency type with unit "dollars", and if that existed I'd call it stronger-(ly?)-typed or more richly typed than Ion. In that case, the data layer would prevent you from accidentally converting "3 inches" to "3 centimeters" by accident, since those would be different types. That would be stronger typing than an example where you simply have the integer 3, and it's the application's job to track which integers represent inches, and which represent centimeters. So perhaps "strong" and "weak" are not the best terms, so much as "stronger" and "weaker".

1 comments

By your definition, any language with strings is weakly typed, since you can always interpret a string as being something else. Strongly/weakly typed has never been a particularly useful description (as the page you linked notes), and I think it's particularly unhelpful here.
> By your definition, any language with strings is weakly typed, since you can always interpret a string as being something else

No, I wouldn't say that's the case. For example, in PHP you can literally write:

  if (1 == "1") { ...
... and the condition evaluates to true. You can do similar things in Excel; Excel doesn't even really differentiate between those two values in the first place. (At least that's how it seems as a casual user.)

This is not the case in strongly typed programming languages that have strings such as C++ or Java. You can convert from one type to another, sure, by explicitly invoking a function like atoi() or Integer.toString(), but the conversion is deliberate and so it is strongly typed. A variable containing a string (java.lang.String) cannot be compared against one containing a timestamp (java.util.Date) by accident. An Ion timestamp is a timestamp and can't be conflated with a string, although it can be converted to one.

Edit: The set of types that are built in, in conjunction with how those types are expressed in programming languages (e.g. timestamp as java.util.Date, decimal as java.math.BigDecimal, blob as byte[]), is why I'd call Ion strongly typed or richly typed in comparison to JSON. Specifically, scalar values that frequently appear in common programs can be expressed with distinctly typed scalar values in Ion. I don't know if there's a good formal definition. You could probably define a preorder on programming languages or data formats based simply on the number of distinct scalar or composite types (so in that sense, yes, it's the fact that Ion has more). However it goes beyond that subjectively. Subjectively it's about how often you have to, in practice, convert from one type to another in common tasks. There is no clear way to represent an arbitrary-precision decimal in JSON, or a byte array, or a timestamp -- so you must "compress" those types down into a single JSON type like string-of-some-format or array-of-number; and several different scalar types must all map to that same JSON type, which creates the risk of conflating values of different logical types but the same physical JSON type with each other. There's no obvious or built-in way to reconstruct the original type with fidelity. There's no self-describing path back from "1999-12-31T23:14:33.079-08:00" and "DEADBEEFBASE64" back to those original types.

I subjectively call JSON weakly typed because its types are not adequately to uniquely store common scalar data types that I work with in programs that I write. I call Ion strongly typed because it typically can. I acknowledged earlier that a data format would be even more strongly typed if it was capable of representing not just the type "integer", but "integer length meters". Ion does not have this kind of type built in, though its annotations feature could be used to describe that a particular integer value represents a length in meters.

> You can't misuse any kind of Ion value that is a string as if it were a timestamp without performing an explicit conversion.

The same is true of JSON. There is no difference, except that Ion has a timestamp type and JSON does not.

If you disagree, please identify what characteristic of Ion's design makes it more strongly typed than JSON, other than the set of types that is built in.

You are choosing a definition of strong typing that supports your argument, but the argument is over the meaning of strong typing to begin with. It's not as if there's some universally accepted definition of strong typing. Like functional programming, functional purity, object oriented, etc.—none of these terms are universally defined.
The fact that "strong typing" has no universal definition is exactly why I think it's not useful.
I hate feeling like I'm nitpicking, but I don't think that's true. I think they do have a well-accepted definition, which appears in Wikipedia, in assorted articles online, and in computer science publications. Here are some examples of CS publications that describe a research contribution in terms of strong typing:

> Strong typing of object-oriented languages revisited. This paper is concerned with the relation between subtyping and subclassing and their influence on programming language design. [...] The type system of a language can be characterized as strong or weak and the type checking mechanism as static or dynamic. http://dl.acm.org/citation.cfm?id=97964

> GALILEO: a strongly-typed, interactive conceptual language. Galileo, a programming language for database applications, is presented. Galileo is a strongly-typed, interactive programming language designed specifically to support semantic data model features (classification, aggregation, and specialization), as well as the abstraction mechanisms of modern programming languages (types, abstract types, and modularization). http://dl.acm.org/citation.cfm?id=3859

> Design and implementation of an object-oriented strongly typed language for distributed applications. http://dl.acm.org/citation.cfm?id=99813

> Strongly typed heterogeneous collections. (Oleg Kiselyov et al.) http://dl.acm.org/citation.cfm?id=1017488

> Strongly typed genetic programming. Genetic programming is a powerful method for automatically generating computer programs via the process of natural selection [but] there is no way to restrict the programs it generates to those where the functions operate on appropriate data types. [When] programs manipulate multiple data types and contain functions designed to operate on particular data types, this can lead to unnecessarily large search times and/or unnecessarily poor generalization performance. Strongly typed genetic programming (STGP) is an enhanced version of genetic programming that enforces data-type constraints and whose use of generic functions and generic data types makes it more powerful than other approaches to type-constraint enforcement http://dl.acm.org/citation.cfm?id=1326695

The argument that the terms have no universal definition cannot be sound in light of their widespread use in computer science publications, even in the title and abstract. Perhaps what you mean to say is that the terms don't have a completely unambiguous or formal definition. That's probably true, but not all CS terms do. The words are contextual and exist on a spectrum, in the sense that a strongly-typed thing is typically in comparison to a more-weakly-typed thing [1]. However, the fact that they're widely used by CS researchers is why I think we should reject the argument that they don't have a universal definition or are not useful. CS researchers like Oleg Kiselyov use the term when describing their papers and characterizing their contributions.

[1] This is true for static and dynamic typing as well: they exist in degrees. Rust can verify type proofs that other languages can't regarding memory safety. Some languages can verify that integer indexes into an array won't go out of bounds. Thus it's not the case that a given language is either statically typed or dynamically typed; rather, each aspect of how it works can be characterized on a spectrum from statically verified to dynamically verified.