| > Honest question, in the era of vibe and AI assisted coding is there any advantages of using untyped programming languages, apart from the fact that non-typed languages has more traning data for the LLM? Author here. Type systems restrict which programs can be expressed and increasing expressiveness often requires increasing type-system complexity (which, speaking from experience, both humans and agents will struggle with). Plus they are not the only mechanism to assert correctness (they only validate a subset of your program correctness and do not replace tests) and you are still on your own when it comes to actually recovering from unexpected errors (something Erlang/Elixir were designed for). I'd say there are two flip sides to your question: 1. Given types do not replace tests, if you can use AI to automate full test coverage, are there actual benefits in static typing for coding agents? The downside of tests for humans is that we suck at writing them (but guided agents can do better) and they can take time to run (which agents do not care) 2. Do we actually have any data or evaluations that show which typing discipline is better for agents? The only benchmark I am aware of [AutoCodeBenchmark] has Elixir come first (dynamic) and C# as second (static), so it doesn't answer the question. There are other benchmarks that show dynamic languages require fewer tokens to solve problems (but that's not a metric I particularly care about) My gut feeling is that local structure, documentation, quality and quantity in the training data, etc are likely to play a more important role than typing for coding agents. I'd also love to measure how agents perform on specific domains. If you are writing concurrent software, how does Elixir/Java/Rust/Go compare? But without data, it's hard to say. [AutoCodeBenchmark]: https://github.com/Tencent-Hunyuan/AutoCodeBenchmark |
I am actually writing a paper on this right now so nothing I can point you to yet but yes. LLMs are better (produce working code in fewer attempts controlling for the relative size of training corpus) when using type systems with inference and global unification. It is largely about the quality of the error feedback channel so languages with very good compiler errors (accurate, localized, include the correction with the failure) can close a lot of ground.
But inference + sound type system gives you a constraint propagation that genuinely restricts the ability of the LLM to get into trouble. Type systems that require annotation give up most of the benefit, since the annotations are themselves surface area for LLM mistakes. Unification also puts heavy limits on the expressiveness of the language which is a confounder and may actually be a big part of the benefit too.
Everyone has been on the "the training data is better" thing but I actually don't think so. All of the languages that people report as being better because of good training data actually have fairly restrictive type systems. Elixir is an exception, but it has exceptionally good error messages! And also, along with erlang, pretty unique runtime semantics that may contribute but that's outside my domain I'm on type systems. Debunking the training quality thing is not what I'm working on but I have deep suspicions about that common wisdom.