Hacker News new | ask | show | jobs
by eeeficus 2515 days ago
It’s a simple and practical solution. It irks me for some reason but I can’t think of something better.
4 comments

It's too bad the squashfs idea was dropped. Having to download and unpack the archive somewhere seems like an extra step that should be unnecessary vs just loading the module and having the tree appear in /sys for you to point your compiler directly at.

Also a bummer that it looks like you have to be running the kernel to get access to this— perhaps the archive could be marked off in the binary somehow so that there's a way to access it for non-running kernels (eg, the dkms use case of wanting to build the module for that kernel you just installed ahead of booting it for the first time).

Overall binary size is a legit concern, but if this is going to be in an optionally-loaded module anyway, it seems weird to make that call just based on memory usage. I imagine most distro kernel builds will just disable this, since they already ship the headers in separate packages.

squashfs was dropped because Greg doesn't like squashfs, which I can understand somewhat (it doesn't have an active maintainer really, there's no userspace library, just binaries, etc.).

However, we're doing a project where I work that depends heavily on squashfs, so the fact that it was dropped from this even though it would have been nicer because people don't like it because it doesn't have a maintainer worries me. Hopefully someone picks it up :)

Oh, gosh. Since when it's without a maintainer? I would think it's quite an important piece of most embedded Linux projects. Those companies really should sponsor someone to take care of it.

My personal project also depends on squashfs. Are there any other options for read-only compressed rootfs?

Squashfs-tools are a bit rough around the edges that's for sure. I patched in ability to produce an image without root for my use, but I think it would be generally useful. Sort list file format is funky. It's: "filename_string priority_integer", so it will not work for files with spaces. Also new lines, but that's far less common. It was not yet a problem for me and I can always patch it more and one can say, that people having files with spaces deserve it, though it's entirely different issue.

Btrfs supports compression. A "seed" flag that makes it read-only. If you 'btrfs device add' a rw second device, it can be read-write with all writes directed to the second device, e.g. /dev/zram device for volatile overlay for e.g. a LiveOS boot, or you can add a blank partition and then remove the seed which causes the seed data to be replicated to the rw partition. Plus all metadata and data are checksummed.

zstd support since linux 4.14, and a mount time option for compression level since linux 5.1. So you could get very good compression ratios equivalent to squashfs, but squashfs will still come out slightly ahead because it also compresses its own metadata, where Btrfs doesn't.

My best guess is squashfs is a sufficiently successful project that it's allowed most distributions who depend on it (quite a few) to sit on its laurels.

Looks like he might (?) be coming back: https://github.com/plougher/squashfs-tools/issues/54#issueco... although the release he's talking about didn't happen.

The last commit that wasn't an API refactoring in the kernel tree is a3f94cb99a854fa381fe7fadd97c4f61633717a5, which is from Aug 2, 2018.

It is not without a maintainer. But you make a good point that the companies that use it should sponsor it. I still have to maintain it for free in my spare time, and spare time gets less and less each year.

In fact it woudn't surprise me if the parent commenter is one of the people working for a multi-billion dollar company that got offended when I told them I wouldn't do some work for them, for free. Hence the suspicion there is an axe to grind.

squashfs has horrible performance. All requests to the block layer are 512 Bytes. Other filesystems like ext4 make much bigger requests and perform much better in the end despite the compression of squashfs leading to lower overall data volume.

Disclaimer: Measured 2 years ago on ARM32, emmc, with a 4.1(?) kernel.

Try using the SQUASHFS_4K_DEVBLK_SIZE config option next time

By default Squashfs sets the dev block size (sb_min_blocksize) to 1K or the smallest block size supported by the block device (if larger). This, because blocks are packed together and unaligned in Squashfs, should reduce latency.

This, however, gives poor performance on MTD NAND devices where the optimal I/O size is 4K (even though the devices can support smaller block sizes).

Using a 4K device block size may also improve overall I/O performance for some file access patterns (e.g. sequential accesses of files in filesystem order) on all media.

Setting this option will force Squashfs to use a 4K device block size by default.

If unsure, say N.

I'm quite sure I have tried all options available in the kernel I used back then without achieving performance comparable to ext4. My project manager was conviced that squashfs makes things faster (and so hoped I initially because the overall data volume is smaller) so I had a hard time to convince him that we will just drop that "optimization" from the project plan. (He was one of those who can prefer checkmarks over technical merit.) I don't remember the 4K option for sure, but if it existed, we tried and measured it. What is the size ext4 is reading from the block device? I'm reading this on holidays on my phone, so I cannot easily fire up blktrace. But I could guess it's 128K or even 256K. So still far from 4K.
emmc is not a MTD device. I measured only om eMMC.
A big part of the problem might be xz decompression, that's been my discovery anyway.

https://bugzilla.redhat.com/show_bug.cgi?id=1717728

I tried various compression algorithms. I don't think there was a CPU bottleneck, at least not with the less aggressive compressions. blktrace showed the difference compared to ext4, squashfs does all reads one by one block.

It was in a previous job. I don't have access to the details anymore. And the kernel was not the newest. But squashfs looked unmaintained already then and that's what they're saying elsewhere in the discussion. So I fear nothing has changed.

I'm the author and maintainer of Squashfs and I can assure you that Squashfs is not unmaintained. Over the last couple of years Squashfs has been stable, without requests for new features, and so work have been mostly only security improvements and bug fixes. But there is a big difference between that and claiming Squashfs is unmaintained.

In fact I don't know why you're claiming it is unmaintained. Got an axe to grind perhaps?

I'm not claiming it's unmaintained. I'm saying that the maintainer is not active. You yourself mention elsewhere in this thread that you have little free time to maintain squashfs, so that seems fair.

No axe to grind; it's a cool project and that's why we decided to use it. Thanks for your work on it, and we'll send patches if/when we have them :)

It irks you because storing source code directly in the final binary is bonkers, but unfortuantely it is the only way to reveal all the complexity that can be exposed in a C header file.

This isn't really just a C problem though. Even rust has a similar problem for propagating macros. The reason for this limitation is really that making an ABI for metaprograms (macros) is exceedingly difficult

Are there other languages that provide interfaces as cleanly as exposing C header files?
java interfaces.

I think the concept of coding to an interface (or an API specification) goes beyond what java did. In my professional life, I see people constantly downplaying and ignoring importance of a formal and stable API specification, just to suffer consequences later.

If you just give me a .jar do I get those interfaces in a consumable form?

Said differently, can I download your .jar and write my own code to interface with it while on a desert island without any other resources?

Yes.

If you have a random JAR file, an IDE can introspect it to see what classes are in it and what methods are in those classes.

The only exception is if you run it through some kind of obfuscator first.

Yes but JAR files are basically equivalent to source code parsed and serialised as bytecode with only the most rudimentary optimisations applied. I don't really see a difference between that and carrying gzipped headers with the kernel. Yes, the .class format is cleaner, but it's still very close to what the programmer wrote. Another plus to carrying the header with the kernel is that you can carry the preprocessed header to make sure the user doesn't set the wrong defines, and you can add compiler checks to check that the user isn't using a too old or too new compiler with incompatible ABI.
Doesn't necessarily work for the kernel, but GIR and typelibs provide machine-readable descriptions of C APIs: https://github.com/GNOME/gobject-introspection
From what I know about it, gobject-introspection has some nice properties, but one killer drawback: it's incompatible with cross compilation. This is apparently because it requires running binaries compiled for the target system as part of the build process. You actually can cross compile if you have an emulator for that system handy [1], but that's horrible.

Apparently the reason it requires running the compiled binaries is that GObject types are only registered at runtime, within so-called "_get_type" functions. For more typical systems, everything needed can be determined at compile time. Too bad there's no portable way to ask a C compiler to dump what it knows about a source file, but if you just want things like struct sizes and field offsets, you can compile a C file that embeds them as global variables, and then extract the variable values. For more advanced introspection there are many less-portable options including Clang's API, GCC-XML, parsing debug info, or writing your own compiler (easier for C; it seems that parts of gobject-introspection work like this).

Anyway, another interesting comparison is DTrace's CTF (Compact Type Format) [2], a simple binary format that describes the kernel's C struct layouts, function signatures, etc. This information is simply converted from compiler-generated debug info [3], but it's stripped down enough that it can be embedded into every kernel without too much size overhead. When the DTrace compiler is invoked to compile a user hook, it parses the CTF data and exposes the types and functions to the user's code (which is written in a custom C-like language).

Ironically, BPF has BTF, which is a very similar-looking format that encodes very similar kinds of data – but is used for a completely different purpose. Specifically, it's only used to encode types and functions defined by BPF programs, to allow the kernel to pretty-print things. But in theory BTF could be repurposed to work like CTF: you would need to generate BTF information for the kernel itself, and then Clang could be extended to support "including" BTF files in place of C headers. However, this option was apparently discussed and rejected [4]. I haven't read the original threads to find out why, but I suspect it might involve:

- Lack of existing tooling to do the above;

- Lower expressivity compared to C headers, e.g. the inability to encode macros (although this could be fixed);

- Desire to use the information for building not just BPF hooks but also full-fledged kernel modules.

[1] https://maxice8.github.io/8-cross-the-gir/

[2] https://github.com/oracle/libdtrace-ctf/blob/master/include/...

[3] https://www.freebsd.org/cgi/man.cgi?query=ctfconvert&sektion...

[4] https://lwn.net/Articles/783832/

Yeah, I'd love to have something like BTF or CTF used widely for machine-readable type information. (https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf... gives some further information there.)

The limitations regarding macros sound like the biggest issue to me (both code-like macros and just simple defined names for values via #define). I'd love to see solutions for that. What do you think that would look like?

Interesting writeup! I didn't realize there was an active attempt to generate BTF for the kernel.

Regarding macros...

Well, to start with, there's the brute-force approach of simply embedding textual macro definitions. That might be good enough for most use cases in practice: as far as I know, most BPF hooks are written in either C or the C-like bpftrace language, so expanding macros as text would probably give a sensible result for the majority of macros that aren't particularly complex. And macro definitions are already included in the DWARF info, so the DWARF-to-BTF approach from your link could be easily extended to embed them.

But it would be nice to describe macros in a more structured format, which could allow use from non-C-like languages and would probably save on file size. Some prior art I'm familiar with is rust-bindgen, which generates Rust bindings for C headers using libclang, and supports translating C macros that expand to constants. Basically it checks each macro that's defined without arguments and uses libclang to try to evaluate it as a C constant expression; this will fail for macros that expand to things other than constant expressions, but it just ignores those. If evaluation succeeds, it translates the macro to a typed Rust constant declaration.

It might be possible to do something similar for BTF. As output format, either add a new 'constant integer' node, or translate such macros as if they were enum definitions. For Linux it would probably be best to avoid a dependency on libclang, but a custom parser might work, or maybe a hackier approach based on feeding things to the C compiler like:

    enum { value_of_SOME_MACRO = ((((((((( SOME_MACRO ))))))))) };
and sorting through the resulting morass of compiler errors :)

Edit: Forgot to mention – functional macros would be nice too, but of course they're much harder to translate. And heck, what about inline functions? Convert them to BPF?

> there's the brute-force approach of simply embedding textual macro definitions. That might be good enough for most use cases in practice

I very much want this for usage from Rust, so that doesn't suffice.

> It might be possible to do something similar for BTF. As output format, either add a new 'constant integer' node

That sounds promising to me, for the common case.

> Edit: Forgot to mention – functional macros would be nice too, but of course they're much harder to translate. And heck, what about inline functions? Convert them to BPF?

In an ideal world, 1) emit a symbol for them so they can be used from any language, albeit not "inline", and 2) compile them to bytecode that LTO can incorporate and optimize, for languages using the same linker.

Neither of those would work for macros designed for especially unusual usages that can't possibly work as functions. (The two most common cases I can think of: macros that accept names and use them as lvalues, and macros that emit partial syntax such as paired macros emitting unmatched braces.) But honestly, flagging those and handling all the common cases via BTF information would still be a huge improvement.

Perhaps we should continue this on an IRLO thread?

Debug symbols, which can be resolved at load time. This would also make ebpf bytecode more or less kernel version independent.