Hacker News new | ask | show | jobs
by cb321 529 days ago
One aspect people don't tend to mention (and TFA does not because it has "general purpose" as a scope) about doing your own allocation arenas is that because a "pointer" is just the name (really number) of an allocated region, this also affords flexibility in the width of those numbers aka "pointer size" that decides your "address" space.

So, for example, if you are willing to limit yourself to 64 KiNodes of equal sized objects, a pool like https://github.com/c-blake/bst/blob/main/POOL.c can use an unsigned short 2-byte pointer. This small address space specialization can give a full 4x space overhead optimization over modern 64-bit pointers.

You could even use 8x less with 1-byte pointers if 256 nodes is enough or use bit fields or equivalents for even more fine-grained address space size selection. Admittedly, linked structures are usually less interesting for very small address space sizes, but this is sometimes useful. People may complain about arbitrary limits that relate to this kind of optimization, but the linked part could be "small by design" without compromising overall scale { such as for structure within a B-tree node or other multi-level indexing setups }.

In any event, I think it is useful to highlight pointer size arbitrariness to junior engineers. It is often learned once & immediately forgotten behind an onslaught of general purpose-ness (& pointer type systems/syntax). Relocation ideas also relate. E.g., if all the pointers are implicitly indices into a (small?) array, and all the code needs a "base address" of that array to get a VMem address, then you can relocate the whole bundle of objects pretty easily changing only the base address.

2 comments

‘Handles are the better pointers’¹ is a related writeup that has been discussed here before².

¹ https://floooh.github.io/2018/06/17/handles-vs-pointers.html

² https://news.ycombinator.com/item?id=36419739

I wish I had caught the original discussions. The moral of the story is that pointers contain too little metadata and not enough places to do bounds checking and type checking to be safer to handle (pun intended). Taking charge of things as “objects” is more robust.

Do notice that it means runtime bounds checking. It also mentions the problem of having a handle reference an object that’s no longer there or that has changed and that the probability of this happening rises with time. This pattern works great in single thread/single process code but starts decaying hard when you have preemptive concurrency. Here you really cannot safely do much of anything without some sort of locking without copy on write semantics (which is also the case with a lot of other systems, just pointing out that this isn’t solved magically by handles though the additional metadata can help detect use-after-free and handle-now-refers-to-a-different-object problem).

So the age old objection that bounds checking slows you down but is necessary to detect problems at runtime is here. You won’t get branch-free code with way compared to straight pointer references but on the other hand you get a lot closer to the concepts of owners and borrowers in straight C which is cool.

Those are good insights and I've probably written way too many words here already, but I wanted to amplify that your "age old" applies to the whole related "BUNDLE OF PRIMORDIAL QUESTIONS:", i.e. "What is a pointer (or name), anyway? and what do we want from it? and how do we want it? at compile/run time?" It's honestly a sub-part of Martin Fowler's famous Two Hard Things in CS joke, although I think that is more commonly (and shallowly) interpreted to be about simple identifier bike-shedding these days. There is probably some good blog post "What's in a Reference?" already written somewhere.

Different answers have different trade-offs in different settings and are variously (un)popular (e.g., x86 memory segments, LBA disk addressing, static/dynamic name scoping in PLs, "virtual" memory itself, ...). Besides the semantic/run-time/safety aspects of "referencing" you mention there are the syntactic ones I mentioned over in https://news.ycombinator.com/item?id=42644009 . Because it is so hard to get one-size-fits-all answers, I think a lot of user-defined layering makes sense, but because of all the hazards you mention, systems support for safety & perf is often really ambitious which makes the user-definition harder (for some/all programmers in the collaboration).

As another attempted witticism in this direction: "Some Assembly Required" is the general context for almost all engineering, but "how much 'some' is offputting" and 'Wait - assembly by WHO?" are big open questions that crystallize around various axes in various human communities. Lately, my one-liner is "Delegation affords so much, but trust sure is tricky!".

Yes, this is true. I didn't use that in the code because I thought it would make it less readable, though. I might mention it somewhere, thank you.
In many prog.lang's, it absolutely makes things less readable. This is not an accident, but because the PL itself specifically tries to syntactically support pointer types (i.e. to literally aid readability). Most PLs do so only for general purpose VMem pointers with user-defined pointers being afterthoughts (at best). You probably need (at least!) operator overloading to get the same level of readability as "PL-native" pointers { though "readability" is a bit subjective and powerful PL features can be divisive; Scare "quotes" for interpretation hazard :-) }.

Anyway, you probably knew all that & I don't mean to suggest otherwise, but it made sense to write since HN loves PL talk.