Hacker News new | ask | show | jobs
by Syntaf 807 days ago
This reminds me of one of my favorite CS assignments in college for a systems programming class:

    > Given a hello world C++ snippet, submit the smallest possible compiled binary
I remember using tools like readelf and objdump to inspect the program and slowly rip away layers and compiler optimizations until I ended up with the smallest possible binary that still outputted "hello world". I googled around and of course found someone who did it likely much better than any of us students could have ever managed [1]

[1]: https://www.muppetlabs.com/%7Ebreadbox/software/tiny/teensy....

3 comments

Does it even matter that the snippet was c++? Couldn't you just get the smallest binary that prints hello world and argue that they are semantically equivalent?

That should be like 10 x86 instructions tops, plus the string data.

I'd note that the smallest instruction sequence doesn't necessarily correspond to the smallest executable file, since you might be able to combine some instruction data with header data to make the file smaller.

For instance, when I tried to make the smallest x86-64 Hello World program (https://tmpout.sh/3/22.html), I ended up lengthening it from 11 to 15 instructions, while shortening the ultimate file from 86 to 77 bytes.

IIRC there's some setup code that's injected before main is called and can be turned off in most compilers (_start?). Should be possible to get down to essentially the same thing you can do with an assembler. C (and so C++ for the most part) is essentially a portable assembler. You can probably also just inline those 10 x86 instructions directly.
It may or may not not possible, but it definitely won't be as easy. Binaries compiled with GCC (and other mainstream compilers) are pretty bloated. You may be able to get the same thing as assembler by manually excluding sections and functions from the output, but it's not just a single commandline parameters.

And the setup code is not completely useless, for example it is responsible for parsing command line parameters and actually running main. If you want to actually use libc instead of just writing assembly directly, you may need it.

You can strip

>_start Almost but not quite! _start is the name of the symbol that is executed first in the program. If you compile with -nostdlib, you need to provide this symbol yourself (this will be your new main).

I don't think the smallest binary was the goal but rather the smallest compiled binary. I assume they were restricted to a specific compiler as well.
At least on DOS, it's more like 4 instructions:

    mov ah, 9
    mov dx, 108
    int 21
    ret
    db "hello world!$"
Our curriculum was c++, so it was familiar territory for the students in terms of both the language and the compiler but yeah you can likely do this exercise with any compiled language.
Seems like it would miss the point of the exercise, which was to learn what a C++ compiler puts into an executable. Though I do realize that if there was a magic wand to get a grade without doing the work, some of us would take it.
If it is a favourite task, then why do I not see more folks creating the smallest possible binaries, for programs other than "hello world". Is it truly enojyable. It is for me because I like saving space on the computers I own. But I am seeing many, many 10+, 20+, 50+ and 100+ MiB programs being written by others in recent times. Some are created in commercial environments, for commercial purposes, but others are allegedly written for the enjoyment. Is it not enjoyable to write small programs.
Because building small binaries takes more human effort than building large ones. Most developers would rather invest that effort in building other stuff (like user-facing features) than invest it in shrinking their binaries.
That post is deserves its own post on HN, what a ride.