Hacker News new | ask | show | jobs
by titzer 1760 days ago
The cycle of crap on Linux is different; less corporate and underhanded, but more of an ever-expanding bloat. Install Clang/LLVM. It's like 300MB. I remember when almost every system had a fairly small C compiler. It had to be small, because it was the basis of everything else. Now the base is enormous. OCaml is something like 200MB. And of course, it has its own package manager. So does Python, and Ruby, and all these other things that are supposed to be the base of so much other software.

Take another example. Install LaTeX. It's something like 5GB. It's huge because it bundles enormous numbers of packages.

It seems like there are zillions of Linux packages > 100MB. What does all this crap do? Why does everything depend on everything?

Take another example. Node. Building it from scratch takes a pretty beefy machine and a lot of time. (It takes over 20 mins on my 6-core workstation with 32GB RAM). Most of that is building V8. When I worked on V8, we periodically spent some time trying to get build times under control, but the needle barely moved until it got going again. We spent months and months of effort, over years, splitting V8 into more source files and more directories and enforcing header discipline, but all of it made build times worse. Despite how cool V8 is, I feel embarrassed in retrospect that the build system is so bonkers.

Linux is like this everywhere. Monstrous and labyrinthine. It really is impossible to understand it all now.

7 comments

Linux has nothing to do with Node, though. Most of the Linux world (and the Windows world, and the OSX world, etc) wishes it would just go away forever. Javascript is unadulterated pain.

As for Linux package sizes go... why are you installing so many packages that you don't need just to complain about it?

On Debian and Ubuntu, `dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n` will tell you the install size of things sorted by worst offender; for me, on one of my development machines, is git, followed by Perl packages required by the system, neovim, and then a bunch of normal things expected on any install. `df -h` minus `/home` is a hair over 600mb.

git, being the largest thing, has an install size of 38 megs. Indeed, I cannot tell you why git is 38 megs, there may or may not be bloat here.

As a comparison, Windows uses around 6GB of space, and a MSVC install that has a common set of toolchains may take up to 20GB and... arguably does less than my <1GB Linux install (when it comes to dev work, anyways).

> why are you installing so many packages that you don't need just to complain about it?

there are 1000s of packages, who has time to select a small subset of that? easier to install chubby swathes at a time.

and you can't install devel versions without bringing in many various forms of TeX

recently I install the emacs package on a system and it required python; that's sacrilege

Your dev linux machine is a gigabyte? That's kind of surprising. Just llvm+clang on my machine according to your query clocks in at over a gigabyte. Are you only developing using perl and not doing any from-source builds?
The machine I looked at doesn't have gcc installed. With gcc, it'd probably cross the 1GB line, but not by far.
dpigs from moreutils is also a nice way to find bloated packages on Debian-based systems.
While I share some of your resentment (especially as a Gentoo user who builds Chromium quite regularly), a few extra gigabytes of storage or a few more config files I don't grok are relatively easy to ignore compared to the kind of dark patterns I see happening with 3rd party software on Windows desktops. And Microsoft itself is increasingly willing to sink to that level as well.
>Install Clang/LLVM. It's like 300MB

Isn't this like a first computer world problem? How many gigabytes does current Visual Studio take? Not talking about VS Code, because it's only an IDE, which still requires the actual Visual Studio C++ as the compiler.

If anything, it seems like a non-first world problem. I assume they have older, less powerful hardware in developing countries.
It is possible to have “headless” MS compiler & build tools in Windows, by installing Windows SDK. It may be used with VSCode.

The full Visual Studio is not required.

How much space does it take?
A surprisingly large amount. Many gigs. I'm not on my Windows machine right now, so I can't give you any figures, but it's enormous.
>Install LaTeX. It's something like 5GB

if you install the full version that has every single package for everything, all with their own manuals etc. At least on debian-based distros there are options to just get the ones relevant to you.

Maybe, but it feels unfair to bring software bloat into a comparison between Linux and Windows. I don't think Linux could ever come close to the degree of software bloat in Windows. Your Clang install is 300MB? Sure. My Visual Studio install is 4.5GB, and it includes at least two versions of Clang (the DirectX shader compiler is a fork of an old version of Clang.) You think dependencies are a problem? You should be happy there's only one copy of each library on your computer (two if you have a multilib system.) I counted more than 25 copies of the C runtime on one Windows computer. You think Linux is monstrous and labyrinthine. Have you seen the scope of the Windows API? There's no comparison.

> It seems like there are zillions of Linux packages > 100MB. What does all this crap do? Why does everything depend on everything?

I dunno man, but software can get complicated, and a lot of these packages probably have features that people use. Why quibble over hundreds of megabytes when popular Windows packages like Office and Creative Cloud are multiple gigabytes? It all seems very unfair that Linux is subject to this level of scrutiny when other systems have worse bloat by literal orders of magnitude.

Modern Linux user space is monstrous and labyrinthine. Linux itself is not. I found it to be a really clean system.

The basis of everything is actually the Linux system call binary interface. We can actually trash the entire user space and start from scratch with nothing but Linux. We can even trash the ubiquitous GNU stuff if we want.

Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters. No need for libc bullshit in the middle. No need for C or any specific language. Someone could make a language today and that single feature would make it as capable as C is for systems programming. They could write software and boot Linux directly into it.

> Why can't we have a compiler with built in system call support?

Funny you should ask that. That is exactly how Virgil's compiler supports the Linux (and Darwin) kernels. Other than generating a small amount of startup assembly (10-20 ins), the compiler just knows the ELF (and MachO) binary formats and the calling conventions of the respective kernels. With some unsafe escape hatches (e.g. getting a pointer into the middle of a byte array), the rest is regular Virgil code that calls the kernel directly.

Take a look, I've been working on this for more than 10 years:

https://github.com/titzer/virgil/blob/master/rt/x86-64-linux...

The "Linux.syscall" is a special operator know to the compiler and it will let you pass an int (the syscall number) and whatever arguments you want (any types--it is implemented with flattening and polymorphic specialization) to the kernel.

With this I have implemented all kinds of stuff, including the userspace runtime system and even a JIT compiler (for my new Wasm engine).

Thanks for this, it's extremely awesome! Really happy to see others have gone so much farther than I ever did.

I started looking into this myself some years ago. Even started developing a liblinux with process startup code and everything. Abandoned it after I found the kernel itself had an awesome nolibc.h file that was much more practical for my C programming needs:

https://elixir.bootlin.com/linux/latest/source/tools/include...

My code is in a bad state but if you'd like to take a look:

https://github.com/matheusmoreira/liblinux

It's amazing how this really lets you do everything... Want a JIT compiler? Map some executable pages and emit some code. You can statically allocate memory at process startup and use that for bootstrapping code. This lets you implement dynamic memory allocation and even garbage collection in your own language.

Nice work!

> Want a JIT compiler? Map some executable pages and emit some code.

Yep, this is exactly what Wizard does.

Targeting POSIX standard functions like open by going through the Linux syscall table looks like just making work for yourself when porting this to other systems.

Some syscalls don't correspond to standard library functions. As an exmaple, if you want to bind to opendir/readdir/closedir, you have to write those yourself in terms of the Linux-specific _NR_getdents64 system call.

Is your LinuxConst.SYS_open actually _NR_open? That's supposedly obsolete. glibc uses _NR_openat for open(). _NR_open is listed in the asm/unistd.h header in a section under the heading "All syscalls below here should go away really ..."

How about signal handling; are you dealing with sigreturn and all that?

You can get a small executable footprint (in terms of not requiring a dynamic C library) by maintaining all this yourself, though.

Oh, I know it's work, but I am not going to assume POSIX, as that's implemented in userspace with C code. In my universe, C code doesn't exist (except I use a little in some testing utilities in order to get going on a new platform). I never ported to Windows, but doing so would be as simple as teaching the compiler the Windows kernel calling convention, adding that little process entry code, and then writing an implementation of System using Windows calls. Oh, yeah, and generating COFF :)

Virgil has its own calling convention internally (though this is basically System V on x86-64). That only matters when getting into V3 code or out, e.g. process entry, calling the kernel, and signal entry. For signals, the compiler generates a tiny stub that copies the signal handler arguments into the V3 registers and then calls user code. To install signals, user code just needs to fill out the right sigaction buffer, as any other system call. To return from signals properly, I studied assembly examples I found online. The runtime doesn't use signals for anything other than catching fatal errors (DivideByZero and NullCheck), so it just prints a source-level stacktrace and then exits. But Wizard needs to recover from signals in order to do proper OOB handling of user programs, so it actually does the proper sigreturn dance, but Wizard only does the fancy stuff on 64-bit.

In my universe, only three things exist: Virgil, wasm, and machine code. I have no need of other languages except as means to test those others :)

Virgil runs on the JVM and on Wasm too, and those require slightly different ways of getting off the ground.

> porting this to other systems

Why care about this? I want Linux on everything instead.

> Why can't we have a compiler with built in system call support? Just add a system_call keyword that inlines Linux system call code using the supplied parameters

It can be implemented as a small function, that first appeared in 4 BSD. It's available in Linux.

$ man syscall

(Unfortunately, this function, lives in glibc. Obviously, though, it doesn't have to. All I'm saying is that this, or a similar function, can be a linkable symbol in some small compiled object file, and not an inline primitive that has to live in the compiler.)

You're of course correct about all this. I believe the glibc thing has created mainly cultural problems. People don't look at Linux as a separate thing.

If I look up Linux system calls on Wikipedia I get diagrams showing glibc wrapping the Linux system call interface because that's what you're supposed to be using. If I look at Linux man pages what I really get is glibc man pages with the actual system calls being almost an afterthought. Glibc wrappers actually do a ton of stuff like add cancellation mechanisms. Glibc also drops support for system calls that break their threading model such as clone.

It's the same problem with systemd. I look up Linux init system man pages and get systemd stuff instead. I expected to see kernel APIs useful for writing my own.

This isn't any different from any current or historic Unix. The system interface has been a C library going back to early Unix.

The library and kernel interface are more separated in Linux systems than in prior Unixes, with user space C libs being totally separate projects.

Over a Linux kernel you can find glibc, ucLibc, musl, Android s Bionic (newlib derivative from BSD), ...

My gcc is tens of KBs

$ wajig size | grep gcc gcc-5-multilib 6 installed gcc-multilib 8 installed gcc 44 installed gcc-6-base 60 installed gcc-5-base 66 installed libx32gcc1 98 installed libgcc1 105 installed lib32gcc1 125 installed libx32gcc-5-dev 6,280 installed lib32gcc-5-dev 7,020 installed libgcc-5-dev 12,193 installed gcc-5 23,648 installed