Hacker News new | ask | show | jobs
by eddd-ddde 809 days ago
Does it even matter that the snippet was c++? Couldn't you just get the smallest binary that prints hello world and argue that they are semantically equivalent?

That should be like 10 x86 instructions tops, plus the string data.

6 comments

I'd note that the smallest instruction sequence doesn't necessarily correspond to the smallest executable file, since you might be able to combine some instruction data with header data to make the file smaller.

For instance, when I tried to make the smallest x86-64 Hello World program (https://tmpout.sh/3/22.html), I ended up lengthening it from 11 to 15 instructions, while shortening the ultimate file from 86 to 77 bytes.

IIRC there's some setup code that's injected before main is called and can be turned off in most compilers (_start?). Should be possible to get down to essentially the same thing you can do with an assembler. C (and so C++ for the most part) is essentially a portable assembler. You can probably also just inline those 10 x86 instructions directly.
It may or may not not possible, but it definitely won't be as easy. Binaries compiled with GCC (and other mainstream compilers) are pretty bloated. You may be able to get the same thing as assembler by manually excluding sections and functions from the output, but it's not just a single commandline parameters.

And the setup code is not completely useless, for example it is responsible for parsing command line parameters and actually running main. If you want to actually use libc instead of just writing assembly directly, you may need it.

You can strip

>_start Almost but not quite! _start is the name of the symbol that is executed first in the program. If you compile with -nostdlib, you need to provide this symbol yourself (this will be your new main).

I don't think the smallest binary was the goal but rather the smallest compiled binary. I assume they were restricted to a specific compiler as well.
At least on DOS, it's more like 4 instructions:

    mov ah, 9
    mov dx, 108
    int 21
    ret
    db "hello world!$"
Our curriculum was c++, so it was familiar territory for the students in terms of both the language and the compiler but yeah you can likely do this exercise with any compiled language.
Seems like it would miss the point of the exercise, which was to learn what a C++ compiler puts into an executable. Though I do realize that if there was a magic wand to get a grade without doing the work, some of us would take it.