Hacker News new | ask | show | jobs
by huijzer 1135 days ago
I used to have a reasonably simple notebook for a paper which took about 35 minutes to compile on an old university-provided CPU; even when opening it for the second time.

Therefore, I‘m really excited for the improvements in code caching! Thanks to Tim Holy, Jameson Nash, Valentin Churavy, and others for your work

1 comments

> Reasonably simple

> 35 minutes to compile

What kind of CPU are we talking about here!?

Probably a decent one. Compilation of even small size projects easily took hours as of a year ago. All while people were screaming about how good it was. The equivalent code in other languages would be statically compiled in milliseconds. The size of the binaries was also thousands of times bigger...
It would be great if you could share such an example work load - such big examples are often few and far between, even though they are VERY good for debugging compilation performance.
Grab any project you have with more than 2k lines, multiple dependencies and run package compiler on it. Wait an hour, hopefully it didn't barf irreconcilably and check the file size.
My packages at work are about 1.5k lines. Precompilation is about 10-15 seconds. After precomp, about 1-2 seconds to load with "import" or "using". The longest I've seen on precomp have been DifferentialEquations.jl. I've not used it in a while, though I have plans to for personal (non-work related) projects.

My packages have 5-10 dependencies in them, I tend to keep my packages/tooling streamlined, and I performance optimize them (everything from loading time, through data structure/algorithm implementation) quite thoroughly.

Other users at my firm are adopting Julia as well. It isn't displacing python, though that is a possible future. Its similar enough that many grasp it right away. Its fast enough that its a viable alternative to Python + C++.

This said, Julia is not a silver bullet[1], though it is an excellent programming language. It has been able to do all of the same tasks as my python code. Faster (often multiple orders of magnitude) working with 10s to 100s of GB of data, in parallel.

Its a better solution for my workflow, and an increasing number of others. If its not to your liking, that's fine. No need to try to push it down with, from my vantage point, what seems like specious claims (very long precompilation times close to an hour, or so). If you have such actual examples, please post them. I'd love to see them. Chances are we could optimize this fairly easily.

[1] https://www.merriam-webster.com/dictionary/silver%20bullet

Diffeqs precompilation is particularly heinous if you aren't on a serious machine. but I think I misspoke. I thought we were discussing package compilation. Unfortunately though, even if we are discussing precompilation theres a ton of cost there even if you go beyond packages. Time to first anything is often brutal. See the discussions on time to first gradient... This is often up to the developer to work out, but it's a challenge that's pretty unique to Julia itself.

When I see people describing viable alternatives to python and or C I personally look at C++ and Rust. Julia's GC is good for most academic embarrassingly parallel number crunching things, but is rough for large scale applications. I've only been able to use Julia in a vacuum for research. For product development, every effort I've seen has eroded insanely fast due to things other languages control much more easily. Those languages can often also do the math fast enough too, especially when the cost of failed experiments is accounted for. All those wait three hours to find out your first gradient descent iteration had a type error that propagated to 1000 compute nodes($) moments are gone. It just can't happen in other paradigms, and in some paradigms it's far less likely to happen and when it does the cost is minor because the cost of compilation was already amortized.

You've gotten Julia to produce a runnable binary? Impressive.
It is possible, within specific limits, using StaticTools.jl and StaticCompiler.jl. Sadly for me, my code won't work within the indicated limits.

This is the biggest issue for me, for deployable code. I'd love to hand my users a single binary (like go/rust), which has all the code/data needed, so no precompilation time, and instant startup. I am hoping the Julia team understand how important this is ... language competitors have runtimes (python, etc.) or binaries (go/rust/c++). We really need the latter to distribute code to production.

Imagine a post compilation step, kind of like the code caching, which wraps everything we need into a binary, with compiled cached code, startup code, runtime libs, etc. . That would be amazing, and fit well within the julia paradigm.

StaticCompiler / StaticTools are a bleeding-edge playground. GP is talking about PackageCompiler.jl which is for serious things that can be trusted to work, but is slow and produces huge binaries.
Plus, to make it fully static, you need to ensure that all code paths are being hit.
I've been looking into that for a while, as a way to create a common environment for users in my team. I did get it to work, though as you mention, build times are long for this.
Yea or you can just write C++ or Rust and honestly... See some advantages in doing so from a maintenance angle. Dynamicly typed languages have some pretty serious shortcomings. Scientists are smart people, they just need some training to learn to use those tools or to work with someone who can help them do it.
You appear to be talking about sysimage compilation which is a whole different beast than package precompilation which is presumably what the OP was doing in a notebook
Most likely a single or dual core CPU.

I have similar compilation times with Rust on an old Asus 1215B, where 8GB and SSD hardly help the compile the world from scratch cargo model, when starting a new project.

reasonably simple notebook != compile the world from scratch
IIRC there was a bug triggering precompilation each time you re-instantiated manifests, and Pluto notebooks keep a manifest of all packages that they used (in order to have full reproducibility, so it doesn't necessarily match your global environment), so Pluto notebooks would effectively precompile, compile, and run each time. I forget the PR that changed this or I would link it, but IIRC somewhere around v1.9-RC2 this was addressed so the v1.9 release should be much nicer to use for Pluto notebooks. I need to double check this myself though since I was last testing Pluto in some of the betas and reported this behavior.
It is, when the libraries aren't shipped as native libraries, and one depends on the slow LLVM to compile them.
if a small piece of code depends on a few big, complex packages then depending on how things work out, due to the JIT model, you might have needed to essentially recompile all these dependencies at runtime every time. now there are increasingly better precompilation tools to avoid this.
But that needs to happen everytime rustc is updated.