Hacker News new | ask | show | jobs
by tester34 1689 days ago
since here's many compiler hackers then I'd want to ask question:

How do you distribute your frontend with LLVM?

Let's say that I have lexer, parser and emitter written in e.g Haskell (random example)

I emit LLVM IR and then I use LLVM to generate something other

but the problem is, that I need to have LLVM binaries and I'd rather avoid telling people that want to contribute to my OSS project to install LLVM because it's painful process as hell

So I thought about just adding "binaries" folder to my repo and put executables there, but the problem is that they're huge as hell! and also when you're on linux, then you don't need windows' binaries

Another problem is that LLVM installer doesnt include all LLVM components that I need (llc and wasm-ld), so I gotta compile it and tell cmake (iirc) to generate those

I thought about creating 2nd repo where there'd be all binaries compiles for all platforms: mac, linux, windows and after cloning my_repo1, then instruction would point to download specific binaries

How you people do it?

4 comments

You can statically link LLVM, no problem.

In fact, you never have to call any binaries specifically; just do it through code and everything should link at compile-time and become one big binary.

This is in fact what Zig does. Everything is statically linked into one binary that is used for compiling, linking, building, testing etc.
When you have an LLVM frontend, what you generally do is have your driver run the optimization and code generation steps itself using the LLVM APIs rather than using opt/llc binaries to drive this step. That way, you don't need the LLVM binaries, just the libraries that you statically link into your executable.

For example, all of the code in clang to do this is located in https://github.com/llvm/llvm-project/blob/main/clang/lib/Cod...

What if my frontend is written in non cpp? e.g haskell, js, java, c#, etc.
You use the LLVM-C bindings via your favorite FFI mechanism to generate the code then, usually.
In the Julia world, we make redistributable binaries for all sorts of things; you can find lots of packages here [0], and for LLVM in particular (which Julia uses to do its codegen) you can find _just_ libLLVM.so (plus a few supporting files) here [1]. If you want a more fully-featured, batteries-included build of LLVM, check out this package [2].

When using these JLL packages from Julia, it will automatically download and load in dependencies, but if you're using it from some other system, you'll probably need to manually check out the `Project.toml` file and see what other JLL packages are listed as dependencies. As an example, `LLVM_full_jll` requires `Zlib_jll` [3], since we build with support for compressed ELF sections. As you may have guessed, you can get `Zlib_jll` from [4], and it thankfully does not have any transitive dependencies.

In the Julia world, we're typically concerned with dynamic linking, (we `dlopen()` and `dlsym()` our way into all our binary dependencies) so this may not meet all your needs, but I figured I'd give it a shout out as it is one of the easier ways to get some binaries; just `curl -L $url | tar -zxv` and you're done. Some larger packages like GTK need to have environment variables set to get them to work from strange locations like the user's home directory. We set those in Julia code when the package is loaded [5], so if you try to use a dependency like one of those, you're on your own to set whatever environment variables/configuration options are needed in order to make something work at an unusual location on disk. Luckily, LLVM (at least the way we use it, via `libLLVM.so`) doesn't require any such shenanigans.

[0] https://github.com/JuliaBinaryWrappers/ [1] https://github.com/JuliaBinaryWrappers/libLLVM_jll.jl/releas... [2] https://github.com/JuliaBinaryWrappers/LLVM_full_jll.jl/rele... [3] https://github.com/JuliaBinaryWrappers/LLVM_full_jll.jl/blob... [4] https://github.com/JuliaBinaryWrappers/Zlib_jll.jl/releases [5] https://github.com/JuliaGraphics/Gtk.jl/blob/0ff744723c32c3f...

I’ll take advantage of this comment to ask the tangential question: Where can i learn how llvm “compilation” works in Julia?

I know code is only supposed to be JIT’ed and then executed by the runtime (that’s why PackageCompiler exists), but still I’d like to know more about how it works..

Like, if i write a simple pure function in Julia and call code_llvm on it… How “standalone” is the llvm code (if that is even a thing)? When does GC get called? How exactly does the generated code depend on the runtime?

Is there any good explanation of this?

To add to Keno's sibling comment, Julia, as a JIT compiler, essentially creates large chunks of standalone, "static" code, and runs those as much as it can, breaking out into the "dynamic" runtime when it has reached the limits of type inference or for some other reason needs to return to the runtime to perform dynamic dispatch etc... in these instances, we break out of the standalone code and start using the runtime to do things like determine where to jump next (or whether to compile another chunk of static code and jump to that). Note that these chunks of static code can be both smaller or larger than a function, it all depends on what Julia can compile in one go without needing to break out into the dynamic environment.
> Like, if i write a simple pure function in Julia and call code_llvm on it… How “standalone” is the llvm code (if that is even a thing)? When does GC get called? How exactly does the generated code depend on the runtime?

It's standalone unless there's explicit calls to the runtime in it. The most common runtime support is probably heap allocation, so if you see a `jl_alloc_obj` in there, that gets lowered to runtime calls eventually. GC gets called during allocation if the runtime thinks there's been enough garbage generated to have a collection be worth it.

Not a compiler hacker and unfamiliar with the scene but is there a specific reason that `git-lfs` wouldn’t work? It’s the first thing that came to mind reading this. You can also pretty easily fetch specific objects as opposed to everything, so in your README you could direct contributors to only fetch specific binaries for given tasks.