Hacker News new | ask | show | jobs
by PaulHoule 740 days ago
One of the interesting tradeoffs in programming languages is compile speed vs everything else.

If you've ever worked on a project with a 40 minute build (me) you can appreciate a language like go that puts compilation speed ahead of everything else. Lately I've been blown away by the "uv" package manager for Python which not only seems to be the first correct one but is also so fast I can be left wondering if it really did anything.

On the other hand, there's a less popular argument that the focus on speed is a reason why we can't have nice things and, for people working on smaller systems, languages should be focused on other affordances so we have things like

https://www.rebol.com/

One area I've thought about a lot is the design of parsers: for instance there is a drumbeat you hear about Lisp being "homoiconic" but if you had composable parsers and your language exposed its own parser, and if every parser also worked as an unparser, you could do magical metaprogramming with ease similar to LISP. Python almost went there with PEG but stopped short of it being a real revolution because of... speed.

As for the kind of problem he's worried about (algorithms that don't scale) one answer is compilation units and careful caching.

6 comments

> One of the interesting tradeoffs in programming languages is compile speed vs everything else.

In the case of Rust it's more of a cultural choice. Early people involved in the language pragmatically put everything else (correctness, ability to ship, maintainability, etc.) before compilation speed. Eventually the people attracted to contribute to the language weren't the sort that prioritized compilation speed. Many of the early library authors reflected that mindset as well. That compounds and eventually it's very difficult to crawl out from under.

I suspect the same is true for other languages as well. It's not strictly a bad thing. It's a tradeoff but my point is that it's less of an inevitability than people think.

I think most people haven't used many languages that prioritize compilation speed (at least for native languages) and maybe don't appreciate how much it can help to have fast feedback loops. At least that's the feeling I get when I watch the debates about whether Go should add a bunch more static analysis or not--people argue like compilation speed doesn't matter at all, while _actually using Go_ has convinced me that a fast feedback loop is enormously valuable (although maybe I just have an attention disorder and everyone else can hold their focus for several minutes without clicking into HN, which is how I got here).
A fast hot code swap of a module is a more important feature in my opinion, but it is somehow even harder to find these days than a fast compilation speed language.

But ideally I want both.

I'd really like to see a careful compiler like Rust have a "fast and loose" mode for the development loop, whereby I swear on my mother's grave that I won't break any rules, and the compiler in turn stops making expensive checks.

This would of course be for development only, not for releasing.

Funnily enough java has both insanely fast compile times, and hot swapping, while being more expressive than Go.
Java's compile times are "insanely fast" because it's not actually compiling to native code, it's compiling to JVM byte code which is actually compiled at runtime. And it is one of the rare languages that manages to be more expressive than Go while also being quite a lot less ergonomic. (:
Go is significantly more verbose and just recently can one implement a goddamn map without hardcoding it in the compiler. Besides all the beautiful if err unreadable “error handling” that makes it all too easy to silently ignore errors, and design mistakes like defer being function scoped, it’s hardly something I would call ergonomic.
I wouldn't call the compile times super fast, but they were not that bad. I brought up the hot code swap thing because I did use it a lot when I was doing Java development.
The compile itself is very fast, but the build tools tend to think a bit when not hot/do some additional stuff besides building. But the actual time spent in `javac` is very short (partially because it only outputs very high level byte code)
This is a direction I've been pushing in partly because I'm using a significantly slower type inference algorithm in my language. I'm hoping with that and focusing on separate compilation I'll be able to keep the fancy inference without sacrificing the UX too much
When I used to program Java I absolutely loved hot code swap and was always amazed how little people even knew it was possible.

If you have a massive codebase no matter how fast your compiler is, re-compiling is going to be slow. But hot code swap is even better in that you can keep any state around without having to set it up all over again.

In Java I could change a method implementation with the program running and as long as I didn't touch my class state it would just work. Re-compilation was slow, but hot code swap was _fast_ and I maybe did a recompile 3-5 times per day total.

Bear in mind that the underlying protocol used to request hot swap on Java is a lot more expressive than the standard HotSpot implementation is. If you use the Jetbrains Runtime (a fork of OpenJDK) or the GraalVM "Espresso" VM (a.k.a. Java on Truffle) then you can do way more hotswapping than you'd be able to normally.

Espresso goes further and doesn't only allow hot swapping but lets you write plugins that react to hot swaps of code:

https://www.graalvm.org/latest/reference-manual/java-on-truf...

If you use the Micronaut web framework then it will selectively re-initialize your app in response to hot swaps that need it. Pretty advanced stuff.

In the case of Rust the fast feedback loop is facilitated by the `cargo check` command which halts compilation after typechecking. Unlike in Swift the typechecking phase in Rust is not a significant contributor to compilation times and so skipping code generation, optimization, and linking is sufficient for subsecond feedback loops.
I mean, you still need to run code at the end of the day. Yeah, the type checker will update your IDE quickly enough, but you still need to compile and link at least a debug build in order to meaningfully qualify as a feedback loop IMHO.
This was my initial mindset as someone whose background lies in untyped languages, but after time with Rust I no longer feel that way. My feeling now is that seeing a Rust codebase typecheck gives me more confidence than seeing a Python or Javascript codebase pass a test suite. Naturally I am still an advocate for extensive test suites but for my Rust code I only run tests before merging, not as a continuous part of development.

To give an example, in the past week I have ported over a thousand lines of C code to a successor written in Rust. During development compilation errors were relatively frequent, such as size mismatches, type mismatches, lifetime errors, etc. I then created a C-compatible interface and plugged it into our existing product in order to verify it using our extensive integration test suite, which takes over 30 minutes to run. It worked the first time. In order to ensure that I had not done something wrong, I was forced to insert intentional crashes in order to convince myself that my code was actually being used. Running that test suite on every individual change would not have yielded a benefit.

> This was my initial mindset as someone whose background lies in untyped languages

Yes, I understand and agree regarding Rust vs dynamic languages, but to be clear my remark was already assuming type checking. I still think you need a full iteration loop even if a type checker gets you a long ways relative to a dynamic language.

While others have, more expressive languages than Go, that compiled blazingly fast in 1990's hardware, with more features than Go will ever get.
I've never understood why people think "more features" is a flex. "Faster compile times" isn't even the primary benefit of fewer features, it's just gravy. More features, even with fast compile times, is a failure (which is probably why most of those "more expressive languages" are no longer with us unless one includes the JIT languages and--disingenuously--only measure the AOT compilation).

EDIT: wow, a downvote within literally 2 seconds of posting!

Do those languages command even a single percentage of marketshare combined?
Most languages are forced to choose between tooling speed and runtime speed, but Python has historically dealt with this apparent dichotomy by opting for neither. (⌐▨_▨)
Python real strength is the speed it can be taught, read and written.
I wish this were true. I've used Python professionally for more than a decade and I still don't consider myself an expert (but I consider myself an expert in Go despite having only used it professionally for a few years). A few things off the top of my head that I still don't understand expertly and yet they chafe me quite often: how imports are resolved, how metaclasses work, how packaging works, etc.

And on the beginner end, even simple things like "distributing a simple CLI program" or "running a simple HTTP service" are complicated. In the former case you have to make sure your target environment has the right version of Python installed and the dependencies and the source files (this can be mitigated with something like shiv or better yet an OS package, but those are yet another thing to understand). In the latter case you have to choose between async (better take care not to call any sync I/O anywhere in your endpoints!) or an external webserver like uwsgi. With Go in both cases you just have to `go build` and send the resulting static, native binary to your target and you're good to go.

And in the middle of the experience spectrum, there's a bunch of stuff like "how to make my program fast", or "how do I ensure that my builds are reproducible", or "what happens if I call a sync function in an async http endpoint?". In particular, knowing why "just write the slow parts in multiprocessing/C/Rust/Pandas" may make programs _slower_. With Go, builds are reproducible by default, naively written programs run about 2-3 orders of magnitude faster than in Python, and you can optimize allocations and use shared memory multithreading to parallelize (no need to worry if marshaling costs are going to eat all of your parallelism gains).

"Python is easy" has _never_ been true as far as I can tell. It just looks easy in toy examples because it uses `and` instead of `&&` and `or` instead of `||` and so on.

It’s not, actually, any more than any other language. That was Guido’s original plan, but show a page of modern Python code to someone who’s never seen it before and they’ll run screaming. There is a minimal subset where you can say it reads like pseudocode, but that’s a very limited subset, and, like AppleScript, you have to have a fair amount of knowledge to be able to write it fluently.
I am more and more convinced that type checked Python is not always the best idea. The people who are the most virulently pro type checking in Python are not data science folks.

Python's type ecosystem's support for proper type checked data science libraries is abysmal (`nptyping` is pretty much the most feature complete, and it too is far from complete), and has tons of weird bugs.

The Array API standard (https://data-apis.org/array-api/latest/purpose_and_scope.htm...) is a step in the right direction, but until that work is close to some sort of beta version, data science folks will have tons of type errors in their code, in spite of trying their best.

It is. Compared to other languages (short of JS without Symbols and async/await/promises mumbo jumbo or lisp) it has much easier entry barrier.
Python absolutely has async/await/promises, and it's actually quite a lot worse than JavaScript in this regard because Python _also_ has synchronous APIs and _no_ tooling whatsoever to make sure you don't call a sync API in an async function thereby blocking your event loop (which, if your application is a networked service with any amount of traffic at all, will typically result in a cascading failure in production). I'm no great fan of JavaScript, and I've written _wayyyyy_ more Python than JS, but async/await/promises is exactly the wrong example to make the case that Python is better.
Structured concurrency with Exception groups make it untouchable for JS. If JS implements structure concurrency like Python, Java, Kotlin, then maybe it can be viable.
It is an ilusion that Python is like BASIC, in reality Python 3.12 is rather complex, more like Common Lisp, when taking into account all language breaking changes during the last 30 years, its capabilities, the standard library, and key libraries in the ecosystem.
Almost every modern language with a well-specified runtime looks a lot like Common Lisp because Common Lisp was one of the first languages specified by adults and the Common Lisp spec had a lot of influence on future languages like Python and Java. For instance most languages have a set of data types in the language or the standard library, such as bignums, that are similar to what CL has.
What Python has in its locker is progressive display of complexity
Python’s real strength is that it has a vast ecosystem of uber powerful libraries written in C/C++.
Shared by any language with FFI capabilities.
In theory. In practice people are very happy with what happens when you

  import pandas
in Python, more so than competitors. I have been hoping though that with the planned Java transition to FFI, you can make Jython pass Python FFI through Java API to get numpy and all that working.
More a side effect from teaching materials than anything else, though.

Java still has the issue of the time Valhala is taking.

Having used Turbo Pascal, Delphi, Modula-2, Active Oberon, Eiffel, D, OCaml, I really don't appreciate that Go puts compilation speed ahead of everything else.

Those languages show one can have both, expressive type systems, and fast compilation turnarounds, when the authors aren't into anti-PhD level languages kind of sentiment.

I'm the author of https://bolinlang.com/ Go is a magnitude slower. I have some ideas why people think go is fast but there's really no excuse for a compiler to be slower than it. Even gcc is faster than go if you don't include too many headers. Try compiling sqlite if you don't believe. Last time I checked you could compile code in <100ms when using SDL3 headers (although not SDL2)
> If you've ever worked on a project with a 40 minute build

I've worked on plenty of C++ code based that had a 2 day build time!

If you were lucky, incremental builds only took a few hours.

I wander what these projects do wrong. Can't they be split into different dynamically loaded libraries with just the headers exposed?
I remember hearing that Microsoft was excited to get full NT builds below 24 hours.
Windows Mobile and Office is where I did my time. :D
> One of the interesting tradeoffs in programming languages is compile speed vs everything else.

Yes, but I don't think that compile speed has really been pushed aggressively enough to properly weigh this tradeoff. For me, compilation speed is the #1 most important priority. Static type checking is #2, significantly below #1 and everything else I consider low priority.

Nothing breaks my flow like waiting for compilation. With a sufficiently fast compiler (and Go is not fast enough for me), you can run it on every keystroke and get realtime feedback on your code. Now that I have had this experience for a while, I have completely lost interest in any language that cannot provide it no matter how nice their other features are.

I think this is a false choice. It comes from the way we design compilers today.

When you recompile your program, usually a tiny portion of the lines of code have actually changed. So almost all the work the compiler does is identical to the previous time it compiled. But, we write compilers and linkers as batch programs that redo all the compilation work from scratch every time.

This is quite silly. Surely it’s possible to make a compiler that takes time proportional to how much of my code has changed, not how large the program is in total. “Oh I see you changed these 3 functions. We’ll recompile them and patch the binary by swapping those functions out with the new versions.” “Oh this struct layout changed - these 20 other places need to be updated”. But the whole rest of my program is left as it was.

I don’t mind if the binary is larger and less efficient while developing, so long as I can later switch to release mode and build the program for .. well, releasing. With a properly incremental compiler, we should be able to compile small changes into our software more or less instantly. Even in complex languages like Rust.

> But, we write compilers and linkers as batch programs that redo all the compilation work from scratch every time.

I don't think that there are that many production level compilers that don't perform the kind of caching that you're advocating for. Part of them problem is what the language semantics are. https://www.pingcap.com/blog/rust-huge-compilation-units/ gives an example of this.

> Surely it’s possible to make a compiler that takes time proportional to how much of my code has changed, not how large the program is in total.

Language design also affects what can be done. For example Rust relies a lot on monomorphisation, which in turn makes much harder (not necessarily impossible) to do in-place patching, but a language like Java or Swift, where a lot of checks can be relegated to runtime, it becomesuch easier to do that kind of patching.

I think that there's a lot left to be done to get closer to what you want, but changing a compiler that has users in such an extensive way is a bit like changing the engine of a plane while it's flying.

Yes, rust has an incremental compilation mode that is definitely faster than compiling the whole program from scratch. But linking is still done from scratch every time, and that gets pretty slow with big programs.

I agree that it would be a lot of work to retrofit llvm like this. But personally I think that effort would be well worth it. Maybe the place to start is the linker.

You're in good company: multiple people want to experiment with taking ownership of the linking and codegen steps to enable this kind of behavior. I would be more than happy to see that happen. I feel that the problem is a project of that magnitude requires a benefactor for a small group of people to work completely dedicated to it for maybe 2 years. Those don't come along often. The alternative is that these projects don't happen, lose steam or take a really long time.
Given that VC++ does incremental linking, it seems to be the usual care of having someone caring enough to sort it out.
That's called incremental compilation and is fully supported by languages like Java or Kotlin. The JVM also supports this, being able to recompile methods on the fly whilst the program runs. And the IDE plugins for these types of languages are actually "presentation compilers". They do all the work of compiling and type checking, except for code emission, and they run fast enough to run continuously in the background on every keystroke.
Not only, some C++ compilers like VC++, Eiffel, CLR, Common Lisp,...

As addendum.

Agreed. For example, Julia (which is a compiled language) has a package called Revise, which provides incremental compilation. A cold start on a package / project / script will take awhile, and even when dependencies are precompiled, but the code you're working on is not, REPL startup takes noticeable amounts of time.

But once you have your REPL prompt, it's just: edit code, test it. Revise figures out what needs recompiling and does it for you. There are some limitations, most notably, any redefinition of a struct requires a reboot, but it's a great experience.

A lot of the current work going into the Zig compiler is to greatly increase the compile time of debug builds, by cutting the LLVM dependency, and then add incremental compilation. I'm looking forward to the fruits of that labor; I don't like to wait.

Is it baked in as the main mode yet? Or as an option to juliac?
s/increase/decrease
> Surely it’s possible to make a compiler that takes time proportional to how much of my code has changed [...]

My understanding is that this is how Eclipse's Java compiler works, but I'm not positive.

> Nothing breaks my flow like waiting for compilation. With a sufficiently fast compiler (and Go is not fast enough for me), you can run it on every keystroke and get realtime feedback on your code.

I already get fast feedback on my code inlined in my editor, and for most languages it only takes 1-2 seconds after I finish typing to update (much longer for Rust, of course). I've never personally found that those 1-2 seconds are a barrier, since I type way faster than I can think anyway.

By the time I've finished typing and am ready to evaluate what I've written, the error highlighting has already popped up letting me know what's wrong.

Yeah even with a large C++ codebase, any decent IDE will flag errors very quickly in real time. I dunno, I've never found that waiting a minute to run a test or whatever is particularly detrimental to my workflow.

I understand the benefits of super fast iteration if you're tweaking a GUI layout or something, but for the most part I'd prioritize many many other features first.

Fast compilation + unit tests allows me to quickly check if I broke a lot more stuff than IDE local checks.
>With a sufficiently fast compiler (and Go is not fast enough for me), you can run it on every keystroke and get realtime feedback on your code.

What language are you using?