Hacker News new | ask | show | jobs
by xnyanta 1094 days ago
I've attended a local CNCF meetup where the chainguard folks presented Wolfi and their related tools to create container images and SBOMs.

I was already skeptical of the product, having heard of it before. Unfortunately, attending that talk confirmed to me that they've just re-invented a severely limited version of Nix powered by yaml files that can output SBOMs in a standard format.

Their software repository only has around 500 packages vs the 80k+ in nixpkgs. When I read "Alpine Linux", I was immediately reminded of its use of the musl libc which has endless DNS resolution issues[1][2] in and outside of Kubernetes, however, thankfully, it seems like they realized bundling musl was a bad idea and ship glibc instead.

I'll stick to Nix and nixpkgs to build my reproducible container images with full dependency graphs back to the source code.

1: https://martinheinz.dev/blog/92

2: https://news.ycombinator.com/item?id=35058094

5 comments

We're rapidly approaching 10k packages, here's today's count:

/ # apk update fetch https://packages.wolfi.dev/os/aarch64/APKINDEX.tar.gz [https://packages.wolfi.dev/os] OK: 9494 distinct packages available

We're definitely coming at this from a different angle from Nix, but the approaches are pretty complementary. I'm a big fan of all the work they do.

musl vs. glibc is one of the big departures we make from Alpine though, we use glibc everywhere because of those issues you pointed out.

> OK: 9494 distinct packages available

I opened that apkindex file and it had duplicate entries for a ton of packages with different versions, taking a look at https://github.com/wolfi-dev/os I only see about 840 yaml files which I assume define the packages. I don't think claiming to have 10k packages when only 10% of them are actually different pieces of software is a good claim to make. Nixpkgs would have millions of packages if we added up every single unique package from every revision.

The real number is probably somewhere in the middle - one yaml file can define many packages - see the gcc or clang or argocd ones for examples of that.

glibc explodes into a few dozen, for example.

I think it's more subpackages such as -dev, -lib and -doc variants. These are defined as part of the parent package but count as distinct packages.
Even so, I did a quick search on repology and Nix derivations with multiple outputs (the nix lingo analogous to the subpackages you mentioned) are counted as a single package. For example, bash has 5 outputs but only counts for 1 package in the 85k figure, so I think comparing 900 packages to 85k is a valid comparison.

Anyway, this is all besides the point I was trying to make which is that I don't see why I should use _yet another_ software distribution that has 1% of the amount of packages found in a mature distribution that already has frequent automatic updates and bleeding-edge software revisions.

We needed Wolfi to be able to create minimal (distroless if you like) container images based on glibc with 0 vulnerabilities. Turns out a lot of other people are interested in Wolfi for various reasons, and we're more than happy to work with them.

You definitely don't need to use Wolfi! But I would say, if you run containers you might want to check out Chainguard Images: https://github.com/chainguard-images/images

Hey, thanks for chiming in. How do they complement each other?
Can anyone help me understand musl libc and DNS issues? Note that I’m most interested in this “DNS over TCP” issue, since the other case I’ve heard of is for custom DNS setup—not for resolving host names in a default configuration.

My reading indicates that DNS resolution simply might not work in certain cases. This seems like a huge problem, yet Alpine Linux is widely deployed and I think Zig uses musl libc as well. In fact , every fully static binary I’ve seen (except for Go?) relies on musl.

For what it’s worth, I’ve seen DNS errors on Alpine (specifically I was getting EAGAIN), but I assumed this was unrelated to musl (and am still unsure). In general, I feel like I see a lot more transient networking errors on Alpine, and I wonder if this is related.

Edit: I also didn’t think DNS would be in libc, so I’ve got a lot to learn…

If you use Musl 1.2.4+ (or Alpine 3.18+), there are no longer the same DNS fallback issues: https://www.openwall.com/lists/musl/2023/05/02/1

To summarize the issue: DNS is done optimistically over UDP because it's faster, but this doesn't work when DNS responses are large because of the design of UDP. TCP should be used as a fallback mechanism when responses are large. This is uncommon normally, but increasingly DNS responses are large in special scenarios; for instance when you're querying an internal DNS for service discovery (read: k8s or nomad deployments, most commonly).

Musl's maintainer interpreted the spec for a libc's resolver to not require TCP fallback (source: https://twitter.com/RichFelker/status/994629795551031296?lan...), so for a long time Musl simply didn't support this feature, justifying it as better UX because of the more predictable performance.

I don't agree with the maintainer on this interpretation, but I am glad the feature was added and the issue is no longer a concern as an otherwise very happy Alpine user!

I’d found bits and pieces of this, but I didn’t have all the context. Thank you for summarizing!
I'd say he was wrong here, and his assumption was incorrect.

RFC2181 specifically says 'Where TC is set, the partial RRSet that would not completely fit may be left in the response'

'may be' being the key words. This would mean that it's up to the implementation to decide whether to include any records at all, and many do not.

It sounds like DNS-over-TCP should be supported now: https://wiki.musl-libc.org/functional-differences-from-glibc...

Edit: Hacker News link from last month:

https://news.ycombinator.com/item?id=35964717

> endless DNS resolution issues

should now be resolved (UDP fallback to TCP), even though it's arguable that this is actually rooted in issues elsewhere in the stack.

As mentioned correctly in [2], DNS issues with musl stem from the fact that it follows the DNS specs strictly and exposed bugs in certain DNS servers.

For an HTTP server replying with 200 OK instead of an 404 Not Found upon ENOENT, the situation is clear that this is not a client-side bug. The same should be assumed of DNS/NXDOMAIN.

As much as I agree with your reasoning, I have experienced this kind of breakage first-hand due to cloudflare and DNSSEC and will continue using DNS resolving code that doesn't cause me pain and suffering (anything that isn't musl) because making this work with musl is quite literally out of my control.
Wait, are you me? I tried to convince musl to not break in this case (https://www.openwall.com/lists/musl/2022/12/04/1), but mostly got a "we're right, cloudflare's wrong" answer. Tweeted at Cloudflare about the issue (https://twitter.com/KennyMacDermid/status/160055878578481971...) and never heard back. I really don't care who's right, and can't change who's using Cloudflare+DNSSEC, so instead I just don't use musl.

[Previous comment](https://news.ycombinator.com/item?id=35058094): ---

My personal 'musl broke it' story comes from resolving domains from Cloudflare that use DNSSEC in a K8 cluster. Basically this:

  - K8 sets container to use `ndot:5`, causing the search list to be used
  - Musl walks that search list looking for domain
  - Cloudflare does not set the NXDOMAIN flag on a DNSSEC domain but does include an NSEC record (if you query with the dnssec flag).
  - Musl takes this 'NOERROR' reply and returns an EAI_NODATA.
Is Cloudflare wrong? I don't know, maybe. They say some things about the standards[0] and that it's technically 'right'. I don't see why they couldn't change the behaviour for queries without the dnssec flag, but I digress.

The issue is that every other libc I tested will continue searching and actually resolve the domain. Musl is the odd one out, and _only_ in the case where the search list ends up with domain using Cloudflare and dnssec.

Even if Musl is 'right' here, when it disagrees with major implementations and a major DNS nameserver does it really matter?

[0]: https://blog.cloudflare.com/black-lies/ ---

> Wait, are you me?

If you hadn't made that HN post I would probably still be scratching my head as to why I have intermittent DNS resolution issues in my personal Kubernetes cluster. I've been having them for years only for some specific software (my mediaserver stack, packaged by linuxserver.io). Since this isn't mission critical stuff I just shrugged it off and dealt with it. Eventually I read your post and it all clicked. I can't find any maintained distributions for that software that don't use alpine/musl sadly so I added[1] dnsConfigs to all of my deployments using musl to force ndots to 1 and accept the fact that all of my cluster name resolution for those deployments will have to be fully qualified. It's a really frustrating situation and made me realize how badly musl plagues the container ecosystem due to its use in Alpine Linux and alpine's popularity because of the small images it produces. Alpine Linux and musl are on my shitlist for life.

1: https://github.com/starcraft66/infrastructure/commit/3b53bb0...

musl doesn't implement TCP DNS at all. When it gets a respond with the truncation bit set, it just assumes the completed records in the truncated response were sufficient, the best records even, and proceeds as if the lookup succeeded. It's hard to take seriously claims that it's strictly implementing the protocol.
It does as of a month or so ago. https://news.ycombinator.com/item?id=36494177
I would love to see non-trivial examples of using the nix toolchain to build images with multiple OS, architectures, SBOMs. As someone unfamiliar with the nix ecosystem, it seems like a tough ask for contributors to require nix knowledge rather than just changing out my existing base image.
I'm not sure what you mean by "non-trivial" but here's a simple discord bot I wrote in python, that I distribute as an OCI image and that is built with Nix for both x86_64 and aarch64 linux via GitHub actions: https://github.com/starcraft66/attention-attention

There is no SBOM because I didn't bother publishing one but the way Nix builds derivations, you basically get the SBOM for free. You could use a tool like sbomnix[1] to trivially generate an SPDX-format SBOM from the nix derivation that builds the container image.

Edit: Since you mention swapping out base images, I think there is a misconception about how building images with Nix works. There is no such thing as a "Base" image, nix builds images from the Dockerfile equivalent of "scratch". You would ditch the Dockerfile completely and use only Nix to build the image.

1: https://github.com/tiiuae/sbomnix