Hacker News new | ask | show | jobs
by svdree 3045 days ago
Sean Barrett (who I think popularized the idea) has a FAQ on this (https://github.com/nothings/stb) where he justifies it by pointing at difficulties with deploying libraries on Windows. Which is a fair point, but by going straight to header-only he skips the step where you can also just distribute a bunch of headers and .C files. The convenience of only having to include a single header is nice for quick weekend projects, but for anything bigger you're dealing with dependencies and build issues anyway.

I get some of the reasons that you would initially start out with a header-only implementation, but when your library grows, you probably want to split it at some point. For me personally, that point would be some time before the header reached 25k (!!) lines.

3 comments

Some advantages of header-only vs .h/.c pair:

- you can build simple tools contained in a single .c file, and you dont't need a build system for this (e.g. just call "cc tool.c -o tool" instead of mucking around with Makefiles or cmake)

- the library can be configured with macros before including the implementation (e.g. provide your own malloc, assert, etc...), with the implementation in a .c file, these config macros must be provided via command line args, which implies a build system

- you can put the implementations for all header-only libs used in a project into a single .c file instead of compiling the multiple .c implementation files, this might speed up compilation

Single-header libs have some of the advantages that a proper module system would bring to C and C++ (e.g. extremely simple integration into own project and increased compile speed), but without all the complexity of implanting a module system into C or C++.

For #1: you already have multiple files in your source so nothing stops you from including the .c file if you feel like compiling a single c file from the command line without a build tool is needed (although when it comes to make usually you can have a single generic makefile that does the same job)

For #3: the #1 applies here too (although beware for static stuff conflicts) but in practice C code compiles fast enough for this to not be a problem

I can see #2 being an advantage, but TBH i think the case where you both need a custom malloc, assert, etc and not need a build tool where you can pass the configuration macros is kinda rare.

Make is so simple that it can actually build you single file project without a makefile with less characters to type:

make tool

Et voilà.

Errr, I actually have to spend 5 minutes googling how to install make on Windows, then getting mingw and hating myself. So no, no it's really not that simple in general.
https://github.com/tom-seddon/b2/blob/master/snmake.exe - standalone build of GNU Make 3.80. Has a couple of hacks in it to better support Windows-style paths with colons in it. I think this came from the SN Systems Playstation2 SDK. Thanks to the GPL, you can have it too.

(I have no idea whether this will successfully run a C compiler for you on Windows, though. I mostly use it to run Makefiles with phony targets as a cross-platform replacement for shell scripts.)

Oh no, Make 3.80...

Why not build a more recent version? This thing is 15 years old!

This sounds like something one should ask people who do not write portable makefiles, rather than porters of Make, which has been standardised for quite a while now.
Good idea! Let me know when you’ve done it... I could do with an updated copy.
Not really fair to blame make for your daft OS.
1) A new dependency to install on every build server and dev's machine unless you're a 'nix only shop (e.g. not in gamedev) and sometimes even then (yes, I've had to apt-get install make) - at the very least this is a boatload of new setup/install instructions, in practice this also involves coordinating with IT, and heaven help you if you let your less programmery coworkers build from source too.

2) Which version of make? I have detailed instructions at work about which make to use with which makefiles such that all the stars align and the OS vars and other gnu tools line up such that our upstream Makefiles work without modifications (well, usually.)

3) Given the complexity of 2, what I'm actually going to do is automatically invoke make from whatever build system we're using internally.

4) Now that I have two sets of build configuration, I have a continual maintenance burden as conflicting dependencies and incompatible build settings need to be resolved. Since I deal with C++ libs and tools, sneezing too hard will cause incompatible libs. In practice, the Makefile will hardcode CCFLAGs, sometimes CC itself, and things like building with/without RTTI will cause incompatible libs I can't link without more Makefile tweaks.

5) When I get sufficiently fed up with the state of affairs in point 4, I'll integrate the tool into our own build system such that we have a single unified build system again and I don't have to make the same change in 5 places (our codebase + a measly 4 dependencies in this example.)

At this point I'm no longer using make and wondering why I didn't just skip directly to step 5 in the first place. Make is simple - so simple it doesn't address my needs nor solve my problems.

> he justifies it by pointing at difficulties with deploying libraries on Windows. Which is a fair point, but by going straight to header-only he skips the step where you can also just distribute a bunch of headers and .C files

That part was justified by deploying libraries for Windows. Going for one file only was justified by this:

"You don't need to zip or tar the files up, you don't have to remember to attach two files, etc."

--

Unrelated, and probably colored by the fact I first learned to program on Windows, but I don't get the problem. Windows applications usually bundle DLLs with them and keep them locally, unlike Linux applications which typically install dependencies globally through a package manager. I don't think I've ever had a big DLL problem developing on Windows, whereas on Linux I've been occasionally bit by the "oh this software requires X <= v2.1 but you can't have that since something else is already using X v2.3 and that would be downgrade".

For some reason, problems with DLL libraries are called as "DLL hell", not "SO hell". ;-)

If your software needs library foo.so.x while other application needs library foo.so.y, just put both into /usr/lib. Problem solved.

Easy, Windows already had dynamic loading on Windows 3.x, which was the same model used by OS/2, while UNIX was still trying to figure out how to implemente dynamic libraries.

The first versions of dynamic linking on UNIX basically required patching a.out files, before ELF was designed.

So of course the expressetion regarding compatibily issues with dynamic libraries came to be "DLL hell", there weren't .so to talk about.

Thanks for the link.

Why not two files, one a header and one an implementation? The difference between 10 files and 9 files is not a big deal, but the difference between 2 files and 1 file is a big deal. You don't need to zip or tar the files up, you don't have to remember to attach two files, etc.

I'm still not convinced. I am convinced about a .c and .h -- that's how sqlite does it. Going to just a .h seems to provide negligible benefit and confuses the implementation and interface. It probably confuses a lot of tools too, e.g. source navigation tools, code coverage, code instrumentation, etc.

I don't understand why anyone cares. you grab the .h file and call "load a jpeg"/"draw a button" or you grab two files and call "load a jpeg"/"draw a button"

Are we bikeshedding about this?

This is a matter of adhering to a sound engineering principle, and the approach in question has not been generally considered acceptable. To explain why, one of the ideas behind header files was that they could be freely reused in more than one part of a project; therefore, for example, any executable code appearing in a header file might end up existing in multiple copies throughout the executable (perhaps, depending on the linker).
> any executable code appearing in a header file might end up existing in multiple copies throughout the executable

that's why the keyword "inline" exists.

'Inline' does not prevent duplication of the generated code (in fact, it forces it).
... mostly. Except in the case where the inlined version can be optimized away, which is the best time to use inline but not entirely germane.
> and confuses the implementation and interface

Normally you have the implementation inside an "#ifdef IMPLEMENTATION" block, and the API interface (public structs and functions) outside of the implementation block, and all private functions inside the implementation block are defined as 'static' so they are not visible outside the special implementation source file. In the places where you include the header for normal use, only the public interface is accessible.

I'm also not convinced by the value of single file especially for semi large libraries like Nuklear. Especially as you could #include the .c file just as well as the .h files if you need. That's as many lines as #define IMPLEMENTATION.

Nuklear was developed as multiple files and hastily merged at the last minute by its author before being tagged 1.0. I think it was a mistake especially as he copied the entire stb_xxx files inside. And unlike libraries like stb_image.h, if you want to use Nuklear you need to setup or copy a non-trivial backend anyway, so it's not a matter of include and ready-to-use.

(Dear ImGui which Nuklear is based on is 7 files including 4 headers as you know.)

To me the core value of those libs is that they are highly portable, designed to compile everywhere without any specific build system, and designed to be compiled from source code. Because they are designed as such, problems (such as error/warnings on some setup) are caught easily and fast.

Whereas for bigger libraries you either get the headache of binaries, either get the headache of figuring out their build systems and building from source which frequently fails. And as people don't frequently build themselves, portability problems aren't caught as often.

OK I get that the big .h file is still logically separated into an .h and .c, like you say.

But what about preprocessing times? If you're including a library from many of your source files, then even if it always hits the #if 0 case, the preprocessor still has to parse the implementation. It matters for distributed compilation too -- more preprocessed bytes have to be sent over the network.

I'm sure there are cases where this overhead is negligible. But I'm just as sure there are some where it's not. Not caring about how much text is in your headers seems like a bad habit to get into. Build times are the main reason I don't use C and C++ more.

The C preprocessor is the least of your worries with regard to compile times. Parsing a #if 0 can be done almost as quickly as a memcmp operation. Even my naive implementation can process an #if 0 at around 600 MBps. Even if you had a gigabyte of text in an #if 0, that's only about 2 more seconds on the compile, provided your disk can manage the throughput.
> Even if you had a gigabyte of text in an #if 0, that's only about 2 more seconds on the compile

Are you sure your implementation is correct? How does it handle this:

    #if 0
    "\
    #endif \
    "
    #endif
I'm not saying gcc's and clang's preprocessors are not really fast, but preprocessing is trickier than most people expect. In particular (as you can see from my example), while skipping over an "#if 0" you still have to split the source into tokens and discard the tokens.
Absolutely, great example! I have looked at quite a few implementations of preprocessors, and they're not simple.

I don't trust anyone who thinks it's simple without actually having implemented it -- and tested it on real code.

One older thread: https://news.ycombinator.com/item?id=10945552

I agree with the sentiment, but it should be easy enough to smash line continuations together while you're searching for that #endif (famous last words)
Is your naive preprocessor implementation complete and standards compatible C99? Or just some bastardization of it? If not the 600 MBps bench is as good as useless.

What's the C preprocessor speed of GCC/Clang/MSVC? Any benches?

Some tests on my system: including everything at the root of /usr/include: (intel 4770HQ)

    $ for file in /usr/include/*.{h,hpp} ; do echo $file | awk '{ print "#include \"" $1 "\"" }' >> /tmp/test.cpp ; done
    $ (laborious step consisting in removing headers that don't play nice with others... in the end there are still more than 1425 base headers, which end up including most of Qt4 & Qt5, boost, etc)
    $ time g++ -E /tmp/test.cpp -fPIC -std=c++1z -I/usr/include/glib-2.0 -I/usr/include/qt -I/usr/include/glibmm-2.4 -I/usr/include/glib-2.0/glib/  -I/usr/lib/glib-2.0/include -I/usr/include/gdkmm-2.4 -I/usr/include/gdkmm-3.0 -I/usr/include/gtk-3.0 -I/usr/include/pango-1.0 -I/usr/include/cairo -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/atk-1.0 -I/usr/include/qt4/ -I/usr/include/qt/QtCore -I/tmp -I/usr/include/qt/QtWidgets -I/usr/include/qt/QtGui -I/usr/include/KDE -D_FILE_OFFSET_BITS=64 -DPACKAGE=1 -DPACKAGE_VERSION=1 -I /usr/include/raptor2 -w -I/usr/include/rasqal -I/usr/include/lirc/include -Wno-fatal-errors -I/usr/include/lirc -I/usr/include/gegl-0.3 -I/usr/include/libavcodec -I/usr/include/freetype2 -DPCRE2_CODE_UNIT_WIDTH=8 -I/usr/include/python3.6m -I/usr/include/libusb-1.0 > out.txt

    1,82s user 0,18s system 99% cpu 2,013 total
out.txt (the whole preprocessed output) is 710kloc and 23 megabytes.

With clang:

    0,72s user 0,08s system 91% cpu 0,875 total 

so I'd say that in general, preprocessing time is fairly negligible. A few template instantiations will take much more time to compile.
Well, you have to multiply it by the number of translation units.

Also, I like the point made in the sibling comment by moefh. Are you sure you don't need to tokenize?

Can I see your implementation? I have looked at a few implementations of C preprocessors, and they're not simple.