Hacker News new | ask | show | jobs
by daniel-thompson 1120 days ago
Dawson's law strikes again!

> O(n^2) is the sweet spot of badly scaling algorithms: fast enough to make it into production, but slow enough to make things fall down once it gets there.

https://randomascii.wordpress.com/2021/02/16/arranging-invis...

9 comments

GTA online was struck too

How I cut GTA Online loading times by 70% https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

Lesson here is to profile your software at some point.

I'm playing a game (Fenyx Rising) and after launching it I always wonder why "Checking for additional content" screen takes 20-30 seconds. I'm pretty sure it should be just a single request.

I doubt that GTA Online released the game with 60K+ items in their online store.

The runtime for the release day store inventory count may have been okay, but I don't think that Rockstar kept profiling their game every time that they modified their store inventory.

The amount of pain Rockstar inflicted on their 90K+ users shows that Rockstar didn't care that their game took more than 3 mins to startup for the majority of their users.

As someone who worked on a smaller, but still millions-of-users live-service game, profiling of average resource usage and the loading screens was done periodically even if there were no code changes to the area of the game in question.

Given that our team was over an order of magnitude smaller than Rockstar, I would be very surprised if they did not have anyone even casually browsing a profiler every 3 months or something, though I think at their scale (LinkedIn claims 5k employees) they can probably have a team or two where everyone's entire job description be performance maintenance.

Video games are probably some of the most profiled code there is (from experience).

But the profiling is all done on the game loop; I've never heard of any teams profiling the startup...

We definitely profile "dev startup" a lot, it directly impacts team speed and ability to work on the codebase. Dev startup usually includes a majority of "user startup" too, so it helps -- but it's also really easy to overlook stuff like "massive mods/mod dirs" and the like.

Speaking from experience... Once, I was so annoyed at a "startup UI"-related dialog taking way (wayyyy) too long to load. I dropped an expletive into Slack and, not much later, I was guided toward fixing my own ~2 year old mistake. We were scanning some "user-generated files" in a way that worked fine but scaled horribly -- the operation went from multiple seconds down to milliseconds. Ugh.

Every few months or so I come back to that article always in awe
I was happy to see R* patched the bug and gave him a $10k bounty for it since last time I saw it.
For a collection of similar stories:

https://accidentallyquadratic.tumblr.com/

the funniest comment ever posted on HN was something like:

"everytime this blog is linked, I end up reading the whole thing"

At least that's probably only O(n) time;)
Not if every single one of its posts gets linked.
About the regexp one: I once wrote a regexp for CSS. It wasn't complete, but you would be able to pinpoint a syntactic error. I hooked it up to a text field, and started entering CSS. All went fine, until I hit a limit, and adding a single character froze Chrome for more than a minute (at which point I killed it).

I don't think it was accidentally quadratic. More likely, it was exponential.

I submitted some posts to this but it doesn't seem like the author has updated it. I probably run into one of these once a month!
Probably because it's MUCH easier to code bubblesort without making mistakes that cause it to not terminate or some such. Especially if they are writing the bootloader in assembly.

For something mission critical like a bootloader that's more valuable than turning O(n^2) into O(n log n). People running systems like BSD largely don't care how long the system takes to boot, once it's booted the system runs for years.

The funny thing is that, in my experience, bubble sort is actually pretty hard to write, because the finicky details of the swap step don't make sense (in a "wait, this can't possibly be right, there's no way you'd want to write it like that" kind of way). Better than even odds that if you have someone write a "bubble sort" without reference, you get an accidentally insertion sort variant.
The easiest sort to implement in assembly is probably selection sort. Bubble sort is actually messier :)

Source: written sorts in assembly more times than I would care to count.

You shouldn’t write a sort algorithm because much smarter people have already done so and their work is included in the standard library.
Sure. In this case, the smarter people wrote the sort in ~1995 and it was good enough for nearly 30 years, but now someone has to step up and be smart again.

You can't always rely on smarter people to be there to do things for you. And you also can't rely on the standard library to have been written by smarter people anyway, and even if so, to have been written by smarter people in order to handle the situation you find yourself in. There's lots of ways to sort, and lots of good reasons to choose ways that are less than optimal for the use cases they end up being used in.

You’re defending the claim that implementing qsort is too hard, definitely the people who wrote the standard library are smarter than the people putting in bubble sort because quicksort is too hard.

This is just a moronic defense of the venerated FreeBSD developers, it’s on a level equal to organized religion. The FreeBSD developers are fine developers and this was dumb, that’s why they replaced it.

And in this day and age there really is no argument for any user space c environment to exist where the qsort standard library function is not available. And even if there was, it would still be smarter to just copy and paste the battletested code from the c library than write another implementation. Because that’s how you end up with bubblesort because doing it right is too hard.

The “standard library” often isn’t available in early boot scenarios.
True, but there are plenty of CS books about algoriths and data structures.
Copying a convenient-looking one out of a CS book is how you end up with a bubble sort. Approximately nobody comes up with a bubble sort from scratch; it's obviously, gratuitously bad in a way that there's no practical reason to think of on your own. The sorts that people come up with without reference are usually insertion and selection sorts—those two were discovered before bubble sort.
Bubble sort occupies a weird space where people assume it's dumb and simple so it'll be the least code, but even insertion sort tends to beat it.
Plenty of children come up with bubble sort as a sorting algorithm. It’s intuitive to go down the list swapping any pairs that happen to be in the wrong order.
Only if they never managed beyond first chapter.
Lifting sample code out of an academic text, now there's a winning strategy for rock solid stability, correctness, and performance!
It is at least on the world where being an engineer actually requires a degree, and not just because someone likes the word.

The real ones, the ones that write USENIX papers.

Converting those into code is not always trivial.
Depends on the quality of the degree.

Besides I would assume anyone getting ready for their leetcode assignments goes through them anyway.

The standard library can be statically linked...
People who think this way haven’t written boot code.. I suppose you’re gonna link the c runtime too and it’s assumption of being a “process” under some “operating system”.. oh wait.

Compared to the rest of the task, writing a sort is pretty darn trivial.

This is true if we're talking about the first stage bios boot that needs to fit in 512 bytes, but there aren't any particular restraints on kernel size at the point in question. Link in anything you want, including qsort.
O(n^2) algorithms often cause performance issues. The main cases I have seen in business logic are: (A) offset pagination (select ... offset n) and then paginate over all entries, and (B) read a text value, append something, store, repeat.
Ummm.. Sure, but what's happening here is that FreeBSD stores a list in a file not in the desired order, then sorts it before use.

It seems to be any prolonged discussion about which sorting algorithm should be used is sort of skipping the elephant. Why is the list not sorted to begin with?

Without that basic knowledge it isn't very productive to focus on implementation details, no matter how fun the sorting exercise is. Deleted code is fast code.

I was talking about "exponential algorithms" in general, and about business logic. I know FreeBSD isn't business logic, but low-level kernel code. I don't know the details of the FreeBSD problem.
How select with offset leads to O(n²)?
For each page, you have a select statement. The first without offset, then with offset 10 limit 10, then offset 20 limit 10, offset 30 limit 10 and so on. The database engine will, for each query, read all entries up to the offset + limit. This is a staircase pattern: first it reads 10, then 20, then 30, and so on. So the sum of read entries is n * n / 2, which is O(n^2).

One could argue that the database engine should be "smarter", but it's not. Note that data could be added or removed in the meantime, so the database engine can't really cache the result. See also https://stackoverflow.com/questions/70519518/is-there-any-be...

Sorry, I've missed the "and then paginate over all entries" part. Anyway thank you for the detailed explanation.
So, how to do offset w/o using OFFSET?
Include the last id (which should be indexed) of the previous page in the next where filter.

https://use-the-index-luke.com/sql/partial-results/fetch-nex...

That's pagination, not indexed page scanning. Both have their place but they're not the same. Pagination is way better to handle updates between page loads and generally more complicated to implement. As you're now doing head tail index cursor tracking. Flat boring offset/limit is amazingly simple for the happy lazy path which is probably fine for most apps.
Make a query whose parameters exclude the previous page results altogether. I learned about this from here: https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin...
If you need to iterate over all records, just do it? Why do you need offset.

Otherwise using offset usually is OK idea. Because users very rarely will inspect page #2153. They're interested with page 1, sometimes page 2. limit/offset works fine for those cases and it'll work for page 2153 for those who visit it once in a decade. Using ids makes logic to trac prev/next/page number incredibly complex and generally you don't need it.

> If you need to iterate over all records, just do it?

Who is "you" here?

Usually what happens is that party A builds a REST API (or other connectionless protocol) for fetching a list of some entity-type; and they limit the number of items that can be fetched in one request (because they have a billion such entities, and they don't want to even try to imagine what kind of system would be necessary to generate and stream back a 5TB JSON response); which implies pagination, to get at "the rest of" the entities.

Then party B, a user of this API, decides that they want to suck party A's whole billion-entity database out through the straw of that REST API, by scraping their way through each page.

> it'll work for page 2153 for those who visit it once in a decade

To be looking at page 2153, the user probably first visited every page before 2153. Which means they didn't do one O(N) query (which would by itself be fine); but rather, they made O(N) requests that each did an O(N) query.

I regularly change 3 to 2 in /new/345689 links when bored with today’s content.

Using ids makes logic to trac prev/next/page number incredibly complex and generally you don't need it.

When it’s a public site, users may post so fast that this “next” can show some previous page. Paging via ids is a must there.

> “next” can show some previous page

That is usually a non-issue. The cost in DB operations is usually much more relevant than it.

When people do actually care about fully enumeration and unicity of the items they they are querying, "pagination" itself tends to be a too messy concept.

The cost in DB operations is usually much more relevant than it.

As a result, a user (all of them) hits “next” again until a page looks like containing new posts. It’s multiple requests wasted.

Anyway, what exactly becomes messy?

Use a sorted index (e.g. Btree), and use those values to quickly find the start of the next page of results.

For good performance this also requires that your search criteria (if any) be compatible with this index.

Off Topic: "Laying out icons on a grid should be an inherently linear operation"

it doesn't seem mentioned in the HN thread the cause here is probably the same thing O(n^2): sorting. laying out icons is only linear if the icons are read in the order that they're placed. It's been a long time since I used windows regularly but my memory is the default placement is by creation time. So if they're read off disk by filename (or some sort of hash) they'd need to be sorted by timestamp.

Even if the disk sorts directory entries by file name and you want to show them sorted by file name, chances are you have to sort.

Reasons? Firstly, you’d have to know the entries are sorted. For that, you need an API that tells you that or hard-code information about file systems in your code. They may exist, but I’m not aware of any file system API that provides that information.

Secondly, the file system may not return the names sorted in the locale you want to sort them in.

Thirdly, the sorting code used in the file system may contain a bug. Once file systems are out there, you can’t fix them (happened in one of Apple’s file systems. HFS, IIRC)

Lastly, modern GUIs tend to sort file names containing numbers non-alphabetically, so that, for example “file 2.jpg” gets sorted before “file 12.jpg”.

So, I think it’s easier to always sort. I would pick an algorithm that works well when items are mostly sorted at the start, though.

you have indexes............
This made me recall modern AI and the issue with quadratic complexity in its transformers. Ooof! A breakthrough here would be a true Breakthrough™ with remarkably larger context sizes. Like it would barely even be a limit anymore and be transformative (har har) to what they can be used for.
Thanks, that's a fun read although I don't understand much of it. I do understand the gist.
Discoveries and analysis like this blog post and the parent show the difference between programmers and engineers.
True but I think the real cause of this is surely that C makes it too hard to use a sorting library that someone competent has written. I would not be surprised if the author was fully aware of the N^2 complexity but opted for a simpler implementation anyway.
qsort() not good enough for you?

(More realistically, below people are discussing that in the kernel environment the set of standard or third party library available may be unavoidably limited)

There is a quirk (almost a bug?) in FreeBSD's qsort where it will switch to insertsort for the whole array under certain conditions, which we hit in production given how our data was arranged.

(I think this was the check) https://github.com/freebsd/freebsd-src/blob/main/lib/libc/st...

Did a bit of digging and found that there used to be a comment for why it was done, but it got removed [0] when they switched to the implementation from Bentley & McIlroy's "engineering a sort function" [1] around 1992.

[0]: https://github.com/weiss/original-bsd/commit/d3fcf71e0db57cb...

[1]: https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engi...

I like how someone felt the need to write out insertion sort in some 4 line code golf challenge in the midst of qsort. This right here is why no one wants to deal with C anymore.
qsort() is pretty slow if you're sorting something like long[]. In that case, radix sort goes 5x faster and vectorized quicksort goes 10x faster. If you're sorting int[] then vectorized quicksort goes 25x faster than qsort(). Nothing goes faster. The issue is it's a big ugly c++ monstrosity that's a huge pain to compile.
That's fair if the constant factor is relevant, but if bubble sort is terminating in any reasonable timescale then the difference between qsort, C++ std::sort, and a custom implementation is really not a factor.
People who don't compute much data don't need computer science.
But if you're comparing to bubblesort.....
No. In most languages sorting a container is `foo.sort()` or something similar. `qsort()` is much more faff.

I mean, clearly it wasn't good enough otherwise they would have used it, no? Perhaps integrating it was a huge faff.

Nonetheless, qsort() is available in libkern.
Not at the time that the bubble-sort code was used though. see https://news.ycombinator.com/item?id=36005209
You mean written. It's been available for many of the years that it has been used. (-:
> that C makes it too hard to use a sorting library that someone competent has written.

This makes no sense to me. What about C makes it hard to use a good library?

I think it comes down to ecosystem fragmentation.

A lot of languages now have common tooling (cargo for rust, pip for python, etc) which makes it easier to find and incorporate the libraries you want. Apparently there are tools like https://conan.io/ but they're not as widely-adopted.

C's build system is similarly non-uniform. Many packages use Makefiles, others use different build mechanisms.

C has no universally-agreed error handling mechanism or convention. Whether exceptions or golang's error interface, you can generally assume that a random package in most languages will handle errors the way you expect. In C it's a lot more varied.

Similarly memory allocation - sometimes in a larger application you want to customize how malloc and friends work. How and whether you can do that for a C package is non-uniform.

Mind you the C standard library has a sort() function which will have sensible big-O behavior on pretty much any platform. I suspect this specific problem is more to do with this being kernel-mode code which has a lot of special conditions.

I am always amazed by arguments that say that not having a language package manager like cargo or pip makes it hard.

Really? Is it really considered hard to link a library without them? Am I so old that somehow I grew up in a world where linking a library was not considered black magic?

There is a lot more going on here.

The first issue is actually downloading the dependencies, doing this manually quickly becomes infeasible for any non-trivial project.

The second issue is keeping everything updated, and making sure that all packages are compatible with all other packages. Doing this manually is also not easy.

With C specifically, you need to wrangle different build systems, and once you have them built and "installed", you need to figure out which linker and compiler flags are needed to consume the libraries.

If you are working on a small project with maybe a few dependencies you can do this by hand, but when you get to say, 15 dependencies, it quickly becomes very difficult.

You can use the system package manager on some systems to install libraries I guess (assuming it has the packages and versions that you need), in this case manually managing things could be a lot easier, but you still should be using pkg-config for portability purposes.

Okay, but this is FreeBSD. All they have to do is import whatever code into src and use it.
But none of that supports the assertion that C makes it hard to use good libraries. You can even use libraries not written in C if you want.

If the argument is really "it's impossible to make a good library in C", that's different. I'd very much disagree with that, but it would be to the point.

I'm saying "it is harder to consume good libraries in C, because it is harder to find them & harder to build them; and once you have done both, you find that good library A and good library B work in very different ways, so you have to do more work to adapt".

And I haven't mentioned the lack of a strong universal string type, the way many libraries typedef their own family of different kinds of integer, the way one library will require you to free() a returned structure and another will require you to call my_library_free()...

It all adds up to additional friction.

You don't have to agree! Maybe I am out of date, I haven't really dealt with this since the mid 2000's. I'd be thrilled to hear this isn't an issue any more.

> You don't have to agree!

It's not really a matter of whether or not I agree. I was just trying to understand what the assertion was!

I was baffled by the notion because I couldn't think of anything inherent in the language that made it hard to use good libraries. Now I understand that's not really what the assertion was.

Although I usually rant about C, using libraries is surely not a problem specifically in the world of UNIX package managers.
The original assertion was about difficulty of using C libraries in the kernel or bootloader. In the bootloader you're the OS. There's no file system, no dynamic linker, and no processes. There's no guarantee some third party library will work in that environment. It might but it's work to make sure it does or adapt it if it doesn't.
Let's say you want to develop a CLI tool in C for crawling a website's sitemap.xml as advertised by the website's robots.txt. How would you approach this development in C?

With e.g. Java, Javascript, PHP, and Python it's clear to me.

With C, I don't know.

libcurl and expat (or one of a zillion other XML libraries)?
> cargo for rust, pip for python, etc

GOOD! I love that C lacks this pollution. Means that code is written for-purpose and tuned for-purpose.

... except in this case (and many others), where it was written for-purpose and then never tuned.
Is the purpose opening up new vulnerabilities?
Custom memory allocation is pretty optional and a lot of the time could be handled with a single buffer.

Outside of that you don't need to deal with exceptions in a sorting library, and you can happily make it a single .c and .h

There's no standard build system. Think about how you add a dependency in Rust, Go, JavaScript, even Python.

Now do it in C/C++. Absolute nightmare.

Look at how many header-only C++ libraries there are out there. They're making compilation time drastically worse purely to avoid the need to faff with a shitty build system. It's a good trade-off too. I use header-only libraries where possible. They're convenient and popular.

Actually vcpkg seems to be going some way to fixing that but I have yet to convince my co-workers to use it.

> to avoid the need to faff with a shitty build system.

Then maybe don't use a shitty build system?

It's true, C is not trying to be a programming environment or tech stack. It's a language, that's it. Whether or not that's desirable depends on what you're trying to do, it's not something that is good or bad in some absolute sense.

You have your choice of build systems, so pick one that meets your needs.

Vcpkg isn't for me, either, because it doesn't solve any problem I have. If it does for you, awesome!

> You have your choice of build systems, so pick one that meets your needs.

And can I pick the one that my dependencies use too? Didn't think so.

Your dependencies have already been built. You're just linking to them. So yes, you can.
C can’t be used to implement a good library for many problems due to being inexpressive. For example, you can’t write an efficient, generic vector data structure, but sort functions are also only fast due to the compiler’s smartness — passing a function pointer would be strictly slower.

Though this has not much relevance here as it is about assembly.

C arrays suck on the best of days and qsort requires that you understand function pointers and types which are some of the hairiest C syntax in common use. The C Clockwise Spiral rule is truly special.

It's easy to lose sight of the climb once you're at the top.

I'd suspect kernel devs know about function pointers.
I'd expect kernel devs to carry lots of bad habits and poorly calibrated intuition from their noob days. Example: "for loops are fine."
What’s wrong with for loops?
Well, we could debate this, but it's all irrelevant to the assertion that C makes it hard to use good libraries.
Oh, right, I forgot that C's library/package management situation sucks so hard that makes the awful syntax look like a comparatively small problem.
This actually made me laugh out loud. Yes, if your problem with C is that it doesn't need a package management mechanism like some other languages, then C is clearly not for you. But C is very far from the only language like this.

It's a bit like criticizing a fish for having no legs.

C package management works great on my system.

  # dnf install foo-devel
A lot of these comments are like "C is hard if you don't know C and it scares you".

Not to mention kernel mode doesn't want a gigantic library package manager to pull in leftpad() from the internet. As mentioned, the kernel libraries on FreeBSD have a qsort, but they didn't in the original commit from 3 decades ago or whatever.

> This makes no sense to me. What about C makes it hard to use a good library?

It doesn't have templates/generics.

The lack of namespaces is of far greater consequence.
LIBNAME_actual_function_name()
Annoyance #1: External symbols are expected to be unique within 31 characters according to the standard, so you're limited to a few namespace "levels", at most.

curl_easy_option_by_name() is already up to 24 characters and there's only two "levels" in curl_easy_*.

Annoyance #2: There's no formal registrar for LIBNAME. This isn't a big deal for popular libraries, but it's a pain having to keep a locally-modified copy of a less popular dependency just because it shares its name with another less popular dependency.

Annoyance #3: LIBNAME_actual_function_name() is a pain to read and using either the preprocessor or static inlines to locally alias function names for the sake of readability is silly.

C11 added (limited) generic support
They used "_Generic" as a keyword but it doesn't really do that.

Suppose I need to define a copy assignment operator for the library's sort function to use. Is there a good way to overload it? Can the library know what the size of each element is based on its type without having to pass it in as a parameter?

You can pass function pointers to the library, but that quickly becomes awful.

  /* sort for basic integer types */
  void lib_sort_u8(uint8_t *a, size_t count);
  void lib_sort_i8(int8_t *a, size_t count);
  void lib_sort_u16(uint16_t *a, size_t count);
  void lib_sort_i16(int16_t *a, size_t count);
  void lib_sort_u32(uint32_t *a, size_t count);
  void lib_sort_i32(int32_t *a, size_t count);
  void lib_sort_u64(uint64_t *a, size_t count);
  void lib_sort_i64(int64_t *a, size_t count);
  /* specify a comparator */
  void lib_sort_comp(void *start, size_t nmemb, size_t size, int (*compar), const void *, const void *));
  /* specify a comparator and an assignment operator */
  void lib_sort_assign_comp(void *start, size_t nmemb, size_t size, void (*assign)(void *, const void *), int (*compar)(const void *, const void *));
  /* specify a comparator, an assignment operator, and ... */
Or, you get one function that takes all of the arguments and have to define and pass in a bunch of function pointers and type size parameters that are each an opportunity for bugs or UB in order to sort a simple array of integers.

If my type needs a custom assignment operator, I need each library I use to take that as an argument. One expects the function pointer to take the arguments in the order (src, dst), another as (dst, src), a third specifies the return value as int instead of void, a fourth takes the source argument as "void *" instead of "const void *" in case you want to implement move semantics and a fifth doesn't support custom assignment operators at all.

It's no surprise that people prefer to avoid this.

You can't specialize a generic algorithm for specific operations and data structures; in C++ terms, it gives you overloading, not templates.
It gives generic programming. C++ templates aren't the only possible implementation.
Slight scarcasm. yes too hard.

http://man.openbsd.org/qsort

having said that this specific sort is somewhere deep in kernel boot land. and kernel code can't really use the standard library. I am not sure if there is a standard kernel sort.

Of course it can. Just extract the qsort implementation to a static library pulled in by the kernel and by libc.
downvote me if you want but this is probably the best argument for rust I've ever seen