Hacker News new | ask | show | jobs
by blacktriangle 1715 days ago
Line noise is a red herring in the static/dynamic comparison. You will still run into serious problems trying to shove ugly human-generated data into your nice clean type system.

For mechanical things where you the programmer are building the abstractions (compilers, operating systems, drivers) this is a non-issue, but for dealing with the ugly real world dynamic is still the way to go.

8 comments

I'm not sure I understand what makes dynamic typing better for handling real-world data. Yes, the data is messy, but your program still has to be prepared to handle data in a specific shape or shapes. If you need to handle multiple shapes of data, e.g. varying JSON structures, you can still do that with static types, using sum types and pattern matching.
Most modern static languages have some way to hold onto a dynamically typed object, stuff them into containers of that dynamic type and do some basic type reflection to dispatch messy dynamic stuff off into the rest of the statically typed program. Sometimes it does feel fairly bolted on, but the problem of JSON parsing tends to force them all to have some reasonably productive way of handling that kind of data.
Yes but this same argument works the other way, dynamically typed languages can do a half-assed impression of static languages as well. So its a tradeoff depending on your domain.
Having programmed for 10 years in a fully dynamic language though I think I prefer the other way around. You tend to wind up documenting all your types anyway one way or another either with doc comments or policies around naming parameters, and wind up building runtime type validation systems. Statically typed languages with cheats seem like it gets you to the right sort of balance much sooner.
The right balance really depends on your domain. The reason I'm so big on dynamic typing is because the most important part of the product I work on is a ton of reporting from an SQL database. I shift as much work as possible to the database, so the results that come back don't need to be unpacked into some domain model but are ready to go for outputting to the user. If I tried to do this in a static language I'd have a new type for every single query, then have to convince my various utility functions to work with my type zoo.
People seem to interpret blacktriangle's post in a parser setting. I don't know why, but if you're writing parsers, you're explicitly falling into the category he's mentioning where static types make a lot of sense.

GP's claim was that Java was too verbose. But verbosity isn't really the problem. There are tools for dealing with it. The problem is a proliferation of concepts.

A lot of business applications goes like this: Take a myriad of input through a complicated UI, transform it a bit and send it to somewhere else. With very accurate typing and a very messy setting (say, there's a basic model with a few concepts in it, and then 27 exceptions), you may end up modeling snowflakes with your types instead of thinking about how to actually solve the problem.

If you're referring to the "parse, don't validate" article, it's using the word in a different sense. The idea is that you make a data model that can only represent valid values, and then the code that handles foreign data transforms it into that type for downstream code to handle instead of just returning a boolean that says "yep, this data's fine"
Right, but where this gets obnoxious is when you're writing code at the "edge", where customers can send you data, and the formats which you accept and process can change wholly and frequently. I've dealt with this problem before in a Scala setting where we were created sealed traits to have Value classes for each of our input types, but it was obnoxious enough that adding a new form of input was pretty costly from an implementation time perspective, enough that handling a new input format was something we planned explicitly for as a team. Sure, you could circumvent this by using something like Rust serde_json's Json Value type, but then you're basically rolling an unergonomic form of what you could do in a couple lines of Python.

I've mostly come to the conclusion that dynamic languages work well wherever business requirements change frequently and codepaths are wide but shallow (e.g. many different codepaths but none of them are particularly involved). Static languages work better for codepaths that are narrow but deep, where careful parsing at API edges and effective type-level modelling of data can create high-confidence software; in these situations the logic is often complicated enough where requirements just can't change that frequently. I wish we had a "best of both world" style to help where you have wide and deep codepaths, but alas that'll have to wait for more PLT (and probably a time when we aren't forming silly wars over dynamic vs static typing as if one was wholly superior than the other.)

I've found this to be a non-issue in Clojure with specification validation. Some call this gradual typing
This has not been my experience.
Alexis King's "Parse, don't validate" is pretty much the final word on using type systems to deal with "messy real world data": https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

tl;dr when used properly, static type systems are an enormous advantage when dealing with data from the real world because you can write a total function that accepts unstructured data like a string or byte stream and returns either a successful result with a parsed data structure, a partial result, or a value that indicates failure, without having to do a separate validation step at all -- and the type system will check that all your intermediate results are correct, type-wise.

I’ve been using the techniques in that article for years in JavaScript, CL and Clojure. While static types are a notable part of it, the more important point is just learning to design your systems to turn incoming data to domain objects as soon as possible.
There are runtime analogs for most of the modeling techniques people use in statically typed languages.
You're not the first, nor fourth for that matter, person to respond to dynamic typing advocation with that blog post, and it's an interesting post but it misses the whole point. The problem is not enforcing rules on data coming in and out of the system. The problem is that I have perfectly valid collections of data that I want to shove through an information processing pipeline while preserving much of the original data and static typing systems make this very powerful and legitimate task a nightmare.
Not a nightmare at all. For example, if you're doing JSON processing in Rust, serde_json gives you great tools for this. The serde_json::Value type can represent any JSON value. You parse JSON to specific typed structures when you want to parse-and-validate and detect errors (using #[derive(Serialize, Deserialize)] to autogenerate the code), but those structures can include islands of arbitrary serde_json::Values. You can also parse JSON objects into structs that contain a set of typed fields with all other "unknown" fields collected into a dynamic structure, so those extra fields are reserialized when necessary --- see https://serde.rs/attr-flatten.html for an example.
> The problem is that I have perfectly valid collections of data (...) and static typing systems make this very powerful and legitimate task a nightmare.

What leads you to believe that static typing turns a task that essencially boils down to input validation "a nightmare"?

From my perspective, with static typing that task is a treat and all headaches that come with dynamic typing simply vanish.

Take for example Typescript. Between type assertion functions, type guards, optional types and union types, inferring types from any object is a trivial task with clean code enforced by the compiler itself.

Presumably the GP's data is external and therefore not checkable or inferrable by typescript. This makes the task less ideal, but still perfectly doable via validation code or highly agnostic typing
> Presumably the GP's data is external and therefore not checkable or inferrable by typescript.

There is no such thing as external data that is not checkable or inferable by typescript. That's what type assertion functions and type guards are for.

With typescript, you can take in an instance of type any, pass it to a type assertion function or a type guard, and depending on the outcome either narrow it to a specific type or throw an error.

You said:

> inferring types from any object is a trivial task

This is true for values defined in code, but TypeScript cannot directly see data that comes in from eg. an API, and so can't infer types from it. You can give the data types yourself, and you can even give it types based on validation logic that happens at runtime, and I think this is usually worth doing and not a huge burden if you use a library. But it's disingenuous to suggest that it's free.

The closest thing to "free" would be blindly asserting the data's type, which is very dangerous and IMO usually worse than not having static types at all, because it gives you a false sense of security:

  const someApiData: any = { foo: 'bar' }

  function doSomethingWith(x: ApiData) {
    return x.bar + 12
  }

  type ApiData = {
    foo: string,
    bar: number
  }

  // no typescript errors!
  doSomethingWith(someApiData as ApiData)
The better approach is to use something like io-ts to safely "parse" the data into a type at runtime. But, again, this is not without overhead.
They don't, though.

For example: a technique I've used to work with arbitrary, unknown JSON values, is to type them as a union of primitives + arrays of json values + objects of json values. And then I can pick these values apart in a way that's totally safe while making no dangerous assumptions about their contents.

Of course this opens the door for lots of potential mistakes (though runtime errors at least are impossible), but it's 100% compatible with any statically-typed language that has unions.

At the edges sure, but why allow that messiness to pervade the system instead of isolating it to the data consuming/producing interfaces?
You still have to deal with the ugliness in a dynamic language too, but you might be tempted to just let duck typing do its thing, which could lead to disastrous results. Otherwise, you'll have to check types and invariants, at which point you might as well parse the input into type-safe containers.