> Linked lists are taught as fundamental data structures in programming courses, but they are more commonly encountered in tech interviews than in real-world projects.
I beg to disagree.
In kernels, drivers, and embedded systems they are very common.
Most people who take data structures courses or perform tech interviews don't end up working on kernels, drivers, or embedded systems though. To me, it sounds like the point being made is that there are a large number of programmers who have learned about linked lists but haven't run into many cases where they needed them in the world world, and I think it's accurate.
Agree, I can't recall using anything more complicated than lists/arrays or hash tables (key/value stores) in practice, in many years of (mostly web application) programming. And even those I'm not coding from scratch, I'm using classes or functions that my programming language gives me. For anything more complicated than that, I'm using a database, which of course is using many data structures under the covers but I don't directly touch those.
I used to use them all the time. However, now? I would be hard pressed to not use one of the many built in vector/list/dict/hash items in many languages now. I would have to be truly doing something very low level or for speed to use one.
As a counterpoint, I’ve been working on collaborative text editing. I ended up implementing a custom b-tree because we needed a few features that I couldn’t find in any off the shelf library:
- My values represent runs of characters in the document.
- Inserts in the tree may split a run.
- Runs have a size - 0 if the run is marked as deleted or the number of characters otherwise. The size changes as we process edits
- Every inserted character has an ID. I need to be able to look up any run by its ID, and then edit it. (Including querying the run’s current position in the tree and editing the run’s size).
It’s an interesting data structure problem, and it took a few weeks to have a good solution (and a few weeks more later rewriting it in safe rust & improving performance in the process).
I love this stuff. I think it’s pretty rare to find a reason to code your own collection types these days, but it certainly comes up from time to time!
> I love this stuff. I think it’s pretty rare to find a reason to code your own collection types these days, but it certainly comes up from time to time!
Absolutely! That is one of the places you want to use that style of programming. As the base classes and built in structs do not really cover it yet.
Also as a counterpoint sometimes the built in ones have some very interesting degenerate cases. I had one in an old library that basically doubled its memory footprint every time you exceeded its buffer. That was a point to change it to be a fixed allocation or something else. If i had no idea of the fundamentals I would have been totally in the weeds and no idea why it was doing it.
Nope; it has linked list syntax (that certainly isn't ignored even by very good compilers). Syntax isn't semantics.
The semantics is that a function write-string is called, with a string as its argument.
The second expression has linked list processing in its semantics because you stuck in a cdr, as well as a quote which makes a piece of the program available as run-time list datum. (This is semantics that could be easily optimized away in the executable form, but I would say that it has linked list processing in its abstract semantics.)
> Nope; it has linked list syntax (that certainly isn't ignored even by very good compilers)
We're looking at the same string and seeing different things. You're seeing `(write-string "hello world")` as a program, I'm seeing it as an expression.
It has linked list semantics, which you can preserve until runtime like this `'(write-string "hello world")`. Note that I didn't change the string, I changed its context. If the original were living in a string, and you called read on it, it would become a linked list. If you called eval on that list, it would become a function call. This is basic stuff which I'm well aware you know, so I'm not sure what all the quibbling is about.
You literally need a linked list to write a program in a language in which the code becomes linked lists. And you're going to have a bad time writing Lisp if you don't get the hang of cons cells, early and often.
Is "code is data" true, or false? You're trying to have it both ways here.
The "large number of programmers who have learned about linked lists but haven't run into many cases where they needed them in the world world" include approximately zero programmers who have wielded Lisp in anger, is my point. I thought that was pretty clear from context, but I guess not.
Really only because they’re so goddamn easy. I find myself using linked lists a lot less since adopting rust for embedded code (even with no_std and no allocator, but especially when alloc-only std data structures are within reach).
Linked lists were heavily used in application software before the appearance of standard libraries and Java, which is when dynamically sizable array-based lists become common. There also wasn't a gap between the performance of linked lists and arrays before CPU became significantly faster than RAM.
First, my only guess is that everyone's guesses are going to be wildly wrong. People who work in such spaces will greatly overestimate. People who don't will greatly underestimate. (This is mostly due to how many comments I've read on HN that implicitly assume that most people's problems and perspectives are the same as the commenter's.)
Second, linked lists are useful in a lot more places than that. Probably a better proxy would be low-level coders. You almost always want a linked list somewhere when you're dealing with memory addresses and pointers. Maybe not for the primary collections, but there are always secondary ones that need to maintain a collection with a different membership or ordering, and vectors of pointers don't have many clear advantages over intrusive linked lists for those secondary collections.
Yeah intrusive collections in C is the biggest use I’ve seen. I played with a physics engine a few years ago (chipmunk2d) which made heavy use of intrusive linked lists to store all the objects in the world model. I suspect there’s some clever data structures out there that might have better performance, but the intrusive linked list approach was simple and fast!
More like 0.01% -- if we consider enterprise programmers, web programmers, and application/game programmers which I'd expect to be the largest groups...
Yep. There aren't many software developers I know who have ever touched {Linux, macOS, FreeBSD, Windows} kernel code except for embedded devs, driver devs, security researchers, hobbyists, and SREs/PEs.
The % who have touched kernel bits, wrote a triangle engine scene renderer, wrote a compiler, touched server metal in production, have worked on ASICs, and can put together ML/AI building blocks shrinks way, way down to a handful of living humans.
This not about blue collar vs white colar. After all corporate programmers and web programmers can both be blue colar, and systems programmers can be white colar (if we're using "blue colar" to mean smaller salaries and fewer percs - otherwise programming is a white colar job anyway).
This is about how many work in kernels/embedded systems/etc vs more common programming gigs. And that's less about how many are trained to do so, but rather how many are needed.
There are plenty of good uses for linked list and their variants. Like LRU lists come to mind; I couldn't bet that it's the most efficient way to implement them but they're pretty darn good. Then obviously things like breadth first search need a type of queue data structure. It often can come down to memory pressure, if you've got Gigs to spare, then allocating a contiguous block of memory for a list of something isn't a big deal, if memory is tight and maybe fragmented, linked lists can and will get it done. They have their places.
I did start to encounter some fresh grads with degrees that said "computer science" on them that couldn't answer some basic linked list questions. I was beginning to think it was a bad set of questions until I hit those kids. If you claim to know "computer science" and don't know what a linked list is, especially beyond some text books stuff, I'm probably not interested.
No memory allocation/reallocation, preallocated resources managed in e.g. a free list. Also for things like packetized networks, lists are handy for filling as you progress down the stack while using fixed sized packet buffers, or reassembling fragments.
In embedded world, memory often needs to be exactly controlled, and allocation failures are fatal without a more complex MMU. In kernel world, I believe the main reason is that allocations can block.
In kernels, it's usually hard to get general-purpose allocation working reliably in all contexts. And you need that for resizable vectors. With lists, you just need to be able to grab an element-sized block. Quite often, it's even done with the memory page granularity.
In addition, a lot of data structures might be shared across multiple cores. Linked lists can be traversed and mutated concurrently (although with a bit of care).
I wonder how much of that is due to the kernel history, and the influence of C idioms, and not because of some inherent design superiority.
I'd be convinced once I see pure Rust kernels geared towards modern machines suddenly using linked lists everywhere. Otherwise I'm leaning towards it being a side-effect of the language choice and culture.
Also because I've seen the same kind of reasoning applied to compilers (e.g. "of course you need linked lists in compilers, they are extremely graph traversal heavy"). But one look at modern compilers implemented in Rust paint a very different picture, with index-based vectors, data-oriented design and flattened ASTs everywhere.
Getting a general memory allocator working in kernel contexts is a hard task. You need to make sure it can't block and is re-enterable, that it doesn't result in fragmentation, and that it can be used from multiple threads.
It can be solved (or worked around), but it's understandable that people don't _want_ to do that.
Any time you have a computer interacting with the outside world in an asynchronous fashion you basically have to have some form of buffering which takes the form of a queue/fifo. A linked list is the most performant/natural way of modeling a queue in our ubiquitous computing infrastructure.
I/e in a DMA-based ethernet driver, the ethernet MAC receives packets asynchronously from the processor, perhaps faster than the processor can ingest them. So the mac interrupts the processor to give it new packets, and the processor can't sit processing the packets in the interrupt context, so it needs to put them into some ordered list for processing later when it has downtime. In a true embedded system, the memory for this list is going to be fixed or statically allocated, but you still don't really want to have an array-style list with fixed indexing, as you'll have to manage what happens when the index wraps around back to 0 etc, so instead you just construct a linked list in that pre-allocated memory.
I wouldn't say linked lists aren't really used in high-level applications, as I said they're used all over the place whenever you have external asynchronous communication, it's just that modern high-level frameworks/libs totally abstract this away from most people writing high level code.
Intrusive linked lists eliminate the allocation entirely. With a vector<Obj>, you have the Obj allocation and then potential vector-related reallocations. With an intrusive linked list, you only have the Obj allocation. So your code that adds/removes list entries does no additional allocation at all, it reuses a pointer or two that was allocated as part of the original Obj allocation. Often the list manipulation happens at a time when allocation failures are inconvenient or impossible to handle.
In more complex embedded software you are likely to see free lists used to manage pools of preallocated resources (like event structs etc) or pools of fixed sized memory buffers.
A common way to implement these is to have an array of messages, sized for the worst case scenario and use this as the message pool.
You keep the unused messages in a single linked "free-list", and keep the used messages in a double linked queue or fifo structure.
That way you get O(1) allocation, de-allocation, enqueue and dequeue operations for your message queue.
Another example for this paradigm are job queues. You might have several actuators or sensors connected to a single interface and want to talk to them. The high level "business" logic enqueues such jobs and an interrupt driven logic works on these jobs in the background, aka interrupts.
And because you only move some pointers around for each of these operations it is perfectly fine to do so in interrupt handlers.
What you really want to avoid is to move kilobytes of data around. That quickly leads to missing other interrupts in time.
I'd say most developers don't write kernels/drivers or embeds, at least from what I've seen. I am not saying that there are not many devs like this, but rather that there are fewer kernel devs than web devs.
I beg to disagree^2. Tasks, threads, and processes are often structured as rings where there is always a "next" to maintain simplicity of task switching. The overall architecture of resources is modelable as cyclic graphs but implemented as rings, deques, single LLs, and other data structures.
linked lists shine when you can perform a O(1) remove operation if you have a reference to an object on the list. This is very common when using C structs and not possible in Java for example.