Hacker News new | ask | show | jobs
by IshKebab 1120 days ago
True but I think the real cause of this is surely that C makes it too hard to use a sorting library that someone competent has written. I would not be surprised if the author was fully aware of the N^2 complexity but opted for a simpler implementation anyway.
4 comments

qsort() not good enough for you?

(More realistically, below people are discussing that in the kernel environment the set of standard or third party library available may be unavoidably limited)

There is a quirk (almost a bug?) in FreeBSD's qsort where it will switch to insertsort for the whole array under certain conditions, which we hit in production given how our data was arranged.

(I think this was the check) https://github.com/freebsd/freebsd-src/blob/main/lib/libc/st...

Did a bit of digging and found that there used to be a comment for why it was done, but it got removed [0] when they switched to the implementation from Bentley & McIlroy's "engineering a sort function" [1] around 1992.

[0]: https://github.com/weiss/original-bsd/commit/d3fcf71e0db57cb...

[1]: https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engi...

I like how someone felt the need to write out insertion sort in some 4 line code golf challenge in the midst of qsort. This right here is why no one wants to deal with C anymore.
qsort() is pretty slow if you're sorting something like long[]. In that case, radix sort goes 5x faster and vectorized quicksort goes 10x faster. If you're sorting int[] then vectorized quicksort goes 25x faster than qsort(). Nothing goes faster. The issue is it's a big ugly c++ monstrosity that's a huge pain to compile.
That's fair if the constant factor is relevant, but if bubble sort is terminating in any reasonable timescale then the difference between qsort, C++ std::sort, and a custom implementation is really not a factor.
People who don't compute much data don't need computer science.
But if you're comparing to bubblesort.....
No. In most languages sorting a container is `foo.sort()` or something similar. `qsort()` is much more faff.

I mean, clearly it wasn't good enough otherwise they would have used it, no? Perhaps integrating it was a huge faff.

Nonetheless, qsort() is available in libkern.
Not at the time that the bubble-sort code was used though. see https://news.ycombinator.com/item?id=36005209
You mean written. It's been available for many of the years that it has been used. (-:
> that C makes it too hard to use a sorting library that someone competent has written.

This makes no sense to me. What about C makes it hard to use a good library?

I think it comes down to ecosystem fragmentation.

A lot of languages now have common tooling (cargo for rust, pip for python, etc) which makes it easier to find and incorporate the libraries you want. Apparently there are tools like https://conan.io/ but they're not as widely-adopted.

C's build system is similarly non-uniform. Many packages use Makefiles, others use different build mechanisms.

C has no universally-agreed error handling mechanism or convention. Whether exceptions or golang's error interface, you can generally assume that a random package in most languages will handle errors the way you expect. In C it's a lot more varied.

Similarly memory allocation - sometimes in a larger application you want to customize how malloc and friends work. How and whether you can do that for a C package is non-uniform.

Mind you the C standard library has a sort() function which will have sensible big-O behavior on pretty much any platform. I suspect this specific problem is more to do with this being kernel-mode code which has a lot of special conditions.

I am always amazed by arguments that say that not having a language package manager like cargo or pip makes it hard.

Really? Is it really considered hard to link a library without them? Am I so old that somehow I grew up in a world where linking a library was not considered black magic?

There is a lot more going on here.

The first issue is actually downloading the dependencies, doing this manually quickly becomes infeasible for any non-trivial project.

The second issue is keeping everything updated, and making sure that all packages are compatible with all other packages. Doing this manually is also not easy.

With C specifically, you need to wrangle different build systems, and once you have them built and "installed", you need to figure out which linker and compiler flags are needed to consume the libraries.

If you are working on a small project with maybe a few dependencies you can do this by hand, but when you get to say, 15 dependencies, it quickly becomes very difficult.

You can use the system package manager on some systems to install libraries I guess (assuming it has the packages and versions that you need), in this case manually managing things could be a lot easier, but you still should be using pkg-config for portability purposes.

Okay, but this is FreeBSD. All they have to do is import whatever code into src and use it.
But none of that supports the assertion that C makes it hard to use good libraries. You can even use libraries not written in C if you want.

If the argument is really "it's impossible to make a good library in C", that's different. I'd very much disagree with that, but it would be to the point.

I'm saying "it is harder to consume good libraries in C, because it is harder to find them & harder to build them; and once you have done both, you find that good library A and good library B work in very different ways, so you have to do more work to adapt".

And I haven't mentioned the lack of a strong universal string type, the way many libraries typedef their own family of different kinds of integer, the way one library will require you to free() a returned structure and another will require you to call my_library_free()...

It all adds up to additional friction.

You don't have to agree! Maybe I am out of date, I haven't really dealt with this since the mid 2000's. I'd be thrilled to hear this isn't an issue any more.

> You don't have to agree!

It's not really a matter of whether or not I agree. I was just trying to understand what the assertion was!

I was baffled by the notion because I couldn't think of anything inherent in the language that made it hard to use good libraries. Now I understand that's not really what the assertion was.

Although I usually rant about C, using libraries is surely not a problem specifically in the world of UNIX package managers.
The original assertion was about difficulty of using C libraries in the kernel or bootloader. In the bootloader you're the OS. There's no file system, no dynamic linker, and no processes. There's no guarantee some third party library will work in that environment. It might but it's work to make sure it does or adapt it if it doesn't.
Let's say you want to develop a CLI tool in C for crawling a website's sitemap.xml as advertised by the website's robots.txt. How would you approach this development in C?

With e.g. Java, Javascript, PHP, and Python it's clear to me.

With C, I don't know.

libcurl and expat (or one of a zillion other XML libraries)?
> cargo for rust, pip for python, etc

GOOD! I love that C lacks this pollution. Means that code is written for-purpose and tuned for-purpose.

... except in this case (and many others), where it was written for-purpose and then never tuned.
Is the purpose opening up new vulnerabilities?
Custom memory allocation is pretty optional and a lot of the time could be handled with a single buffer.

Outside of that you don't need to deal with exceptions in a sorting library, and you can happily make it a single .c and .h

There's no standard build system. Think about how you add a dependency in Rust, Go, JavaScript, even Python.

Now do it in C/C++. Absolute nightmare.

Look at how many header-only C++ libraries there are out there. They're making compilation time drastically worse purely to avoid the need to faff with a shitty build system. It's a good trade-off too. I use header-only libraries where possible. They're convenient and popular.

Actually vcpkg seems to be going some way to fixing that but I have yet to convince my co-workers to use it.

> to avoid the need to faff with a shitty build system.

Then maybe don't use a shitty build system?

It's true, C is not trying to be a programming environment or tech stack. It's a language, that's it. Whether or not that's desirable depends on what you're trying to do, it's not something that is good or bad in some absolute sense.

You have your choice of build systems, so pick one that meets your needs.

Vcpkg isn't for me, either, because it doesn't solve any problem I have. If it does for you, awesome!

> You have your choice of build systems, so pick one that meets your needs.

And can I pick the one that my dependencies use too? Didn't think so.

Your dependencies have already been built. You're just linking to them. So yes, you can.
And what if the dependencies haven't been built yet?
C can’t be used to implement a good library for many problems due to being inexpressive. For example, you can’t write an efficient, generic vector data structure, but sort functions are also only fast due to the compiler’s smartness — passing a function pointer would be strictly slower.

Though this has not much relevance here as it is about assembly.

C arrays suck on the best of days and qsort requires that you understand function pointers and types which are some of the hairiest C syntax in common use. The C Clockwise Spiral rule is truly special.

It's easy to lose sight of the climb once you're at the top.

I'd suspect kernel devs know about function pointers.
I'd expect kernel devs to carry lots of bad habits and poorly calibrated intuition from their noob days. Example: "for loops are fine."
What’s wrong with for loops?
They encourage accidentally quadratic behavior if they are easier than calling sort().
Well, we could debate this, but it's all irrelevant to the assertion that C makes it hard to use good libraries.
Oh, right, I forgot that C's library/package management situation sucks so hard that makes the awful syntax look like a comparatively small problem.
This actually made me laugh out loud. Yes, if your problem with C is that it doesn't need a package management mechanism like some other languages, then C is clearly not for you. But C is very far from the only language like this.

It's a bit like criticizing a fish for having no legs.

> if your problem with C is that it doesn't need a package management mechanism like some other languages

The problem isn't that it doesn't need one, it's that it doesn't have one. I have no idea why you would think that it doesn't need one.

Well there is vcpkg now anyway so it finally does have one.

C package management works great on my system.

  # dnf install foo-devel
That's suboptimal for many reasons. Package names are not consistent. Installing multiple versions is usually impossible. Huge pain for library authors. Doesn't integrate with build systems usually (even basic pkg-config support is iffy). The command is OS-specific. You can't usually choose the linking method. Difficult to bundle dependencies. Usually out of date. Etc. etc.
A lot of these comments are like "C is hard if you don't know C and it scares you".

Not to mention kernel mode doesn't want a gigantic library package manager to pull in leftpad() from the internet. As mentioned, the kernel libraries on FreeBSD have a qsort, but they didn't in the original commit from 3 decades ago or whatever.

> This makes no sense to me. What about C makes it hard to use a good library?

It doesn't have templates/generics.

The lack of namespaces is of far greater consequence.
LIBNAME_actual_function_name()
Annoyance #1: External symbols are expected to be unique within 31 characters according to the standard, so you're limited to a few namespace "levels", at most.

curl_easy_option_by_name() is already up to 24 characters and there's only two "levels" in curl_easy_*.

Annoyance #2: There's no formal registrar for LIBNAME. This isn't a big deal for popular libraries, but it's a pain having to keep a locally-modified copy of a less popular dependency just because it shares its name with another less popular dependency.

Annoyance #3: LIBNAME_actual_function_name() is a pain to read and using either the preprocessor or static inlines to locally alias function names for the sake of readability is silly.

@1: The limits are considered obsolete since C89 and implementations are encouraged to avoid them whenever possible. I think the same way we disregard non ASCII character sets, non two's complement encodings and similar, we are safe in assuming sane implementation being able to handle longer names. And being honest, if given implementation isn't capable of more than 31 significant characters, then it would have problems with namespaces too.

@2: Agree, although I don't recall this ever happening to me.

@3: Is it? How is libname::actual_function_name() much better?

I actually like to use libname__actual_function_name(), as it further separates "namespace" from function name (unless we need compatibility with C++, as IIRC it reserves all double underscores, not only at the beginning).

C11 added (limited) generic support
They used "_Generic" as a keyword but it doesn't really do that.

Suppose I need to define a copy assignment operator for the library's sort function to use. Is there a good way to overload it? Can the library know what the size of each element is based on its type without having to pass it in as a parameter?

You can pass function pointers to the library, but that quickly becomes awful.

  /* sort for basic integer types */
  void lib_sort_u8(uint8_t *a, size_t count);
  void lib_sort_i8(int8_t *a, size_t count);
  void lib_sort_u16(uint16_t *a, size_t count);
  void lib_sort_i16(int16_t *a, size_t count);
  void lib_sort_u32(uint32_t *a, size_t count);
  void lib_sort_i32(int32_t *a, size_t count);
  void lib_sort_u64(uint64_t *a, size_t count);
  void lib_sort_i64(int64_t *a, size_t count);
  /* specify a comparator */
  void lib_sort_comp(void *start, size_t nmemb, size_t size, int (*compar), const void *, const void *));
  /* specify a comparator and an assignment operator */
  void lib_sort_assign_comp(void *start, size_t nmemb, size_t size, void (*assign)(void *, const void *), int (*compar)(const void *, const void *));
  /* specify a comparator, an assignment operator, and ... */
Or, you get one function that takes all of the arguments and have to define and pass in a bunch of function pointers and type size parameters that are each an opportunity for bugs or UB in order to sort a simple array of integers.

If my type needs a custom assignment operator, I need each library I use to take that as an argument. One expects the function pointer to take the arguments in the order (src, dst), another as (dst, src), a third specifies the return value as int instead of void, a fourth takes the source argument as "void *" instead of "const void *" in case you want to implement move semantics and a fifth doesn't support custom assignment operators at all.

It's no surprise that people prefer to avoid this.

You can't specialize a generic algorithm for specific operations and data structures; in C++ terms, it gives you overloading, not templates.
It gives generic programming. C++ templates aren't the only possible implementation.
So where is a generic vector data structure written in plain C that is efficient (that is, doesn’t store pointers to elements).
Slight scarcasm. yes too hard.

http://man.openbsd.org/qsort

having said that this specific sort is somewhere deep in kernel boot land. and kernel code can't really use the standard library. I am not sure if there is a standard kernel sort.

Of course it can. Just extract the qsort implementation to a static library pulled in by the kernel and by libc.
downvote me if you want but this is probably the best argument for rust I've ever seen