Hacker News new | ask | show | jobs
by aleclm 807 days ago
I think most of your concerns about messing with the environment are sensible only under the assumption that you actually do `source environment`.

In truth, we suggest to do that only so you use the GCC we distribute for the demo binary. The actual way this is intended to be used is through the `./revng` script. In that way, the environment changes only affect the invocation of `revng`.

This is documented here: https://docs.rev.ng/user-manual/working-environment/ We should probably add a warning about `source ./environment`.

Now, let's get to each of your comments :D

> though thankfully not LD_LIBRARY_PATH

We spent a lot of time to have a completely self-contained set of binaries where each ELF refers to its dependencies through relative paths. LD_LIBRARY_PATH is evil.

> Mostly prefixed "HARD_"

Those are just used by our compiler wrappers, I don't think those environment variables collide with anything in practice.

> It sets `AWS_EC2_METADATA_DISABLED="true"`

Original discussion: https://github.com/revng/revng/pull/309#discussion_r12805759...

I guess we could patch the AWS SDK to avoid this. Anyway, it affects only when rev.ng is running in the cloud.

> export RPATH_PLACEHOLDER=... > export HARD_FLAGS_CXX_CLANG=...

Those are used when linking binaries translated by revng. If you're not interested in end-to-end binary translation, they don't matter.

> it means 'runpath' which is a similar but much less useful construct

We specifically want DT_RUNPATH. DT_RPATH is deprecated and there might an use case for overriding our libraries with LD_LIBRARY_PATH.

> There's a good chance you can completely remove the environment mangling

I think your observations concerning "mangling the environment" are only valid for non-private environment variables. The following variables are private: RPATH_PLACEHOLDER, HARD_*, REVNG_*. Also, they are all only for binary translation purposes. We could push them down into some smaller-scoped compiler wrappers, but those make sense only if we can get rid of environment entirely, which we can't because we ship Python.

> a combination of setting different flags when building clang

No, the flags also affect the linker and there's some features of our wrappers that cannot simply be burned in. We can push them in more private places, though.

> a lot of failure modes can be removed > libc++abi and libunwind can and probably should be statically linked into the libc++

We no longer have issues with that, our build system is pretty reliable in that regard. LLVM is just one of the components, these things need to work robustly in general, and they do (with quite some effort).

You seem to be wary of using dynamic linking, we put some effort in it, now it works pretty good and always looks up things in the right place, and without ever hardcoding absolute paths anywhere, nor any install phase that "patches" the binaries. The unpacked directory can be moved wherever you want.

> I am completely out of patience with distributing dynamically linked programs on Linux

You're thinking of some other solution, our solution does not use LD_LIBRARY_PATH and all the binaries reference each other in a robust way using `$ORIGIN`. Try:

    ./root/bin/python ./root/bin/revng artifact --help
It works.

But again, doing `source environment` is mostly for demo purposes, in the actual use case, you just do `./revng` and your environment is untouched.

We ship our Python, but you don't have to use it: you're supposed to just do ./revng (or interact over the network in daemon mode).

Our approach is: use whatever tool you like for scripting as long as it can parse our YAML project file, make changes to it, and then invoke `./revng artifact` (or interact with the daemon): https://docs.rev.ng/user-manual/model-tutorial/

Result: we get to use our Python version (the latest) and you get to use whatever language you like. Then we'll provide on pypi wrappers that help you with that and are compatible with large set of Python versions.

tl;dr Don't `source ./environment`, use `./revng`.

> Thanks again for shipping, and I hope some of the above feedback is helpful!

I'm happy there's someone that cares about this :D

Our next big iteration of this might involve simplifying things a lot by adopting nix + mount namespace to make /nix/store available without root.

Maybe this is not the right place for discussing this, we can chat on our discord server if you'd like :)

1 comments

Not setting environment variables is indeed solved by not setting environment variables - but `source ./environment` is what's written on the announcement page at the top of this thread. './revng' doesn't appear anywhere on it.

You haven't set LD_LIBRARY_PATH but other people will do. Also LIBRARY_PATH, and put other stuff on PATH and so forth. Module systems are especially prone to this, but ending up with .bashrc doing it happens too.

You have granted the user the ability to override parts of the toolchain with environment variables and moving files to various different directories. That's nice. Some compiler devs will appreciate it. Also it's doing the thing Linux recommends for things installed globally so that's defensible.

In exchange, you will get bug reports saying "your product does not work", where the root cause eventually turns out to be "my linker chose a different library to my loader for some internal component". You also lose however many people try the product once, see it immediately fall over and don't take the time to tell you about the experience.

I think that's a bad trade-off. Static linking is my preferred fix, but generally anything that stops forgotten environment variables breaking your software in confusing ways is worth considering.

> `source ./environment` is what's written on the announcement page at the top of this thread. './revng' doesn't appear anywhere on it.

You're right, but after that there's a link to the docs where we say to use `./revng`. The blog post is for the impatient :) On the long run the docs is what most people will look at.

I don't think we want to support use cases that might break system packages too. If you set LD_LIBRARY_PATH to a directory where you have an LLVM installation, that might break any system program using LLVM too... Why should we try to fix that using `DT_RPATH` (which is a deprecated way of doing things) when system components don't do it?

We might cleanup the environment from LD_LIBRARY_PATH and other stuff, that might be a sensible default, yeah. Also we might have some sanity check printing a warning if weird libraries are pulled in.

But it's hard to take a decision without a specific use case in mind. If you have an example, bring it forward and I'm happy to discuss what should be the right approach there.

LLVM picking up the wrong libraries from the environment has cost me at least a couple of months over the last decade or so. Maybe twenty instances of customers being broken, ten hours or so in meetings explaining the problem and trying to persuade people that the right thing really is different for the system compiler vs your bespoke thing.

If you think it's better for your product to find unrelated libraries with the same name at runtime, you go for it.

Detecting that failure mode would be an interesting exercise - you could crawl your own address space after startup and try to guess whether the libraries you got are the ones you wanted. Probably implementable.