| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kazinator 255 days ago

Git hashes have nothing whatsoever to do with whether you can do a clean build of the same tree twice with the same results, bit for bit.

Git hashes or tags can help identify what was built: the inputs.

You only need to know that for traceability: when you hold the released outputs, but do not hold (or are not sure you hold) the matching inputs.

If builds are reproducible, the traceability becomes more meaningful.

In the TXR project, have a ./configure option called --build-id. This sets an ID that is appended to the version, which is in the executable. It is nothing by default; not used. It is meant to be useful for people who interact with the code, so they can check what they are running (things can get confusing when you are going back and forth among versions, or making local changes).

If you set the build ID it to the word "git", then it is calculated using:

  git describe --tags --dirty

that's probably what this author should be using. It gives you a meaningful ID that is related to the most recent release tag, and whether the repo was dirty.

  $ git describe --tags --dirty
  txr-302-20-g77c99b74e-dirty

We are (sadly, only) 20 commits after 302, at a commit whose short hash is 77c99b74e, and the repo is in a modified state.

I have it rigged in the Makefile that it actually keeps track of the most recent build ID in a little .build_id file. If the build ID changes relative to what is in that file, the Makefile will force a rebuild of the .o files which incorporate the build ID.

Also, there is no need to be generating dynamic #include material just for this. A simple -Dsymbol=var option in the CFLAGS will define a preprocessor symbol:

  CFLAGS += -DMY_BUILD_ID=\"$(my_build_id)\"

1 comments

shoo 255 days ago

Yep, your way of framing it is clearer. Embedding version information in released binary artefacts helps answer the question of "what version of the software even produced this output/is crashing in production?". This is the problem that the author is focusing on, and it is an important thing to sort out early in any serious project, especially if you ship software that gets deployed to customer machines. Setting this up early will probably even pay for itself before the software is in production as knowing what version is deployed where can reduce wasted time due to confusion about which experimental version is deployed to what non prod environment.

It's addressing a distinct problem from "if we rebuild any given version, perhaps some later time, do we even get the same binary?" which is what people usually mean by "reproducible builds".

Your tip that injecting build ids can be done with linker flags without needing to generate header files is a great one.

Passing version info without code generation using linker flags can also be done in other languages & toolchains, e.g. with Go projects, the go linker exposes an -x flag that can be used to set the value of a string variable in a package [1] [2].

A step beyond this could be to explicitly build a feature into your software to help the user report bugs or request support, e.g. user clicks a button and the software dumps its own version info, info about what the user is doing & their machine, packages it up and sends in to your support queue. Doesn't make sense doing this for backend services, but you do see support features like this in PC games to help users easily send high quality bug reports.

[1] https://pkg.go.dev/cmd/link

[2] https://www.digitalocean.com/community/tutorials/using-ldfla...

ignoramous 255 days ago

> Passing version info without code generation using linker flags can also be done in other languages & toolchains, e.g. with Go projects, the go linker exposes an -x flag

Someday, Go programs won't have to do this: https://github.com/golang/go/issues/50603

kazinator 255 days ago

In short, "traceable bill of materials" != "reproducible build"

Which golfs to "traceable" != "reproducible"