Hacker News new | ask | show | jobs
by jcrites 3715 days ago
That's a great analogy! However, I do think strongly typed vs. weakly typed has a role in thinking about this, just a different dimension than the one you're describing. Let's say we come across a JSON structure that looks like this:

  {"start": "2007-03-01"}
Is that a timestamp? Maybe! Does it support a time within the day? Perhaps I can write "2007-03-01T13:00:00" in ISO 8601 format if we're lucky. Can I supply a time zone? Who knows for sure? It's weakly typed data. The actual specification of that type of that field lives in a layer on top of JSON, if it's even specified at all. It might be "specified" only in terms of what the applications that handle it can parse and generate. I could drop that value into Excel and treat it as all sorts of different things.

Ion by comparison has a specific data type for timestamps defined in the spec [1]. The timestamp has a canonical representation in both text and binary form. For this reason, I know that "2007-02-23T20:14:33.Z" and "2007-02-23T12:14:33.079-08:00" are valid Ion timestamp text values. In this instance I would describe Ion as strongly typed and JSON as weakly typed. Or, as the Ion documentation puts it, "richly typed".

To make an analogy, weakly typed is the Excel cell that can store whatever value you put in it, or the PHP integer 1 which is considered equal to "1" (loose equality). Strongly typed is the relational database row with a column described precisely by the table schema. Weakly typed is the CSV file; strongly typed is the Ion document.

[1] http://amznlabs.github.io/ion-docs/spec.html

1 comments

Ion has more data types than JSON, it's true. Ion has a timestamp type and JSON does not, so you could say it's "richer" if you want, but that just means "it has more types."

However I don't think it's accurate to say that the typing of Ion is any "stronger." Both Ion and JSON are fully dynamically typed, which means that types are attached to every value on the wire. It's just that without an actual timestamp type in JSON, you have to encode timestamp data into a more generic type.

The notions of "strong" and "weak" typing have never been particularly well-defined, but I think my usage is in line with their usual meaning: https://en.wikipedia.org/wiki/Strong_and_weak_typing

> Some programming languages make it easy to use a value of one type as if it were a value of another type. This is sometimes described as "weak typing".

Strong typing makes it difficult to use a value of one type as if it were another. In PHP, you can compare the integer value 1 to the string value "1" and the equality test returns boolean true. Conflating integer 1 and string "1" is weak typing. A data format that expresses the concept of the timestamp 1999-12-31T23:14:33.079-08:00 using the same fundamental type as the string "Party like it's 1999!" is what I would call weakly typed.

Ion does not make it easy to use a string as if it were a timestamp or vice versa. It has types like arbitrary precision decimals, or binary blobs, that can't easily be represented in a strongly-typed way in JSON. You can certainly invent a representation, like specifying strings as ISO 8601 for timestamps, or an array of numbers for binary -- actually, wait, how about a base64-encoded string instead? Where there's choice there's ambiguity. These concepts of "type" live in the application layer in JSON, instead of in the data layer like they do in Ion.

Note as well that stronger is my term. The Ion documentation says "richly-typed". Certainly Ion does not include every type in the world. Perhaps a future serialization framework might capture "length" with a unit of "meters", or provide a currency type with unit "dollars", and if that existed I'd call it stronger-(ly?)-typed or more richly typed than Ion. In that case, the data layer would prevent you from accidentally converting "3 inches" to "3 centimeters" by accident, since those would be different types. That would be stronger typing than an example where you simply have the integer 3, and it's the application's job to track which integers represent inches, and which represent centimeters. So perhaps "strong" and "weak" are not the best terms, so much as "stronger" and "weaker".

By your definition, any language with strings is weakly typed, since you can always interpret a string as being something else. Strongly/weakly typed has never been a particularly useful description (as the page you linked notes), and I think it's particularly unhelpful here.
> By your definition, any language with strings is weakly typed, since you can always interpret a string as being something else

No, I wouldn't say that's the case. For example, in PHP you can literally write:

  if (1 == "1") { ...
... and the condition evaluates to true. You can do similar things in Excel; Excel doesn't even really differentiate between those two values in the first place. (At least that's how it seems as a casual user.)

This is not the case in strongly typed programming languages that have strings such as C++ or Java. You can convert from one type to another, sure, by explicitly invoking a function like atoi() or Integer.toString(), but the conversion is deliberate and so it is strongly typed. A variable containing a string (java.lang.String) cannot be compared against one containing a timestamp (java.util.Date) by accident. An Ion timestamp is a timestamp and can't be conflated with a string, although it can be converted to one.

Edit: The set of types that are built in, in conjunction with how those types are expressed in programming languages (e.g. timestamp as java.util.Date, decimal as java.math.BigDecimal, blob as byte[]), is why I'd call Ion strongly typed or richly typed in comparison to JSON. Specifically, scalar values that frequently appear in common programs can be expressed with distinctly typed scalar values in Ion. I don't know if there's a good formal definition. You could probably define a preorder on programming languages or data formats based simply on the number of distinct scalar or composite types (so in that sense, yes, it's the fact that Ion has more). However it goes beyond that subjectively. Subjectively it's about how often you have to, in practice, convert from one type to another in common tasks. There is no clear way to represent an arbitrary-precision decimal in JSON, or a byte array, or a timestamp -- so you must "compress" those types down into a single JSON type like string-of-some-format or array-of-number; and several different scalar types must all map to that same JSON type, which creates the risk of conflating values of different logical types but the same physical JSON type with each other. There's no obvious or built-in way to reconstruct the original type with fidelity. There's no self-describing path back from "1999-12-31T23:14:33.079-08:00" and "DEADBEEFBASE64" back to those original types.

I subjectively call JSON weakly typed because its types are not adequately to uniquely store common scalar data types that I work with in programs that I write. I call Ion strongly typed because it typically can. I acknowledged earlier that a data format would be even more strongly typed if it was capable of representing not just the type "integer", but "integer length meters". Ion does not have this kind of type built in, though its annotations feature could be used to describe that a particular integer value represents a length in meters.

> You can't misuse any kind of Ion value that is a string as if it were a timestamp without performing an explicit conversion.

The same is true of JSON. There is no difference, except that Ion has a timestamp type and JSON does not.

If you disagree, please identify what characteristic of Ion's design makes it more strongly typed than JSON, other than the set of types that is built in.

You are choosing a definition of strong typing that supports your argument, but the argument is over the meaning of strong typing to begin with. It's not as if there's some universally accepted definition of strong typing. Like functional programming, functional purity, object oriented, etc.—none of these terms are universally defined.
I generally agree, except the "type" of JSON numbers isn't well-defined with respect to precision and binary-vs-decimal floating point representation. An application that cares deeply about either aspect of numbers can't rely on JSON alone to ensure that the values are properly interpreted by all consumers.
That is a good point in that it is a very accurate reading of the JSON spec. In practice many (even most) JSON implementations don't give applications access to any precision beyond what an IEEE double can represent. So while you may take advantage of arbitrary precision in JSON and be fine according to the spec, your users will probably suffer data loss unless they are very picky about what JSON library they use. For example, JSON.parse() in JavaScript is out.
It's more than just precision, it's making sure that the same value comes out that went in, and that things haven't been subtly altered via unintended conversions between decimal and binary floating-point representations. Obviously this is quite important when you've got both text and binary formats.

Some applications really need decimal values, and some really need IEEE floats. Ion can accurately and precisely denote both types of data, making it easier to ensure that the data is handled properly by both reader and writer.