Hacker News new | ask | show | jobs
by cmccabe 4755 days ago
It's nice that you made these available. You might also want to look at queue.h from BSD (see http://fxr.watson.org/fxr/source/sys/queue.h) and tree.h (http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/sys/tree.h), single-file "libraries" that can generate a few different kinds of linked list and binary tree. These don't require typecasting at all, since they generate functions for your particular type. They are also intrusive data structures, meaning you control your own allocation.

With regard to OBString: don't. Really. C strings are very flexible and powerful with just what's in the standard library. You don't need charAtStringIndex, for example: you have rindex and index. You don't need splitString; you have strtok_r. You don't need findSubstring; you have strstr.

If you think you need regexes, consider fnmatch, a libc function which matches shell wildcard patterns. It's always there, it's simple, and it's fast. Or you could bring in something like PCRE. In general, if you need full regexes in a small program, it might be worth rethinking whether your program should be in C anyway.

For some reason, I don't find myself using resizable array utility code much in C. It's pretty easy to do it yourself with realloc-- that's really one of the first things they should teach people learning practical C. Perhaps a small, minimal set of macros could wrap the reallocs and make it a little bit nicer.

5 comments

Intrusive data structures are indeed more powerful. I made my own instrusive AVL-tree which can be found here [1] and an example here [2]. There's also the extra feature that the concept of a "link" is abstracted, so you can for example build a compressed AVL-tree inside an array using array indices instead of pointers, which don't break when the array is reallocated. It's also built in a different way than this usually done (macros), that is, a header file is included which redefines some identifiers, and the actual code of the functions is not inside macros, hence, easier to read.

About strings, I firmly believe that C's zero-terminated strings should be avoided and only used when necessary to communicate with existing code. They're really not simple and fast. You can't take a null-terminated string and extract a null-terminated sub-string without modifying the original - which is very often needed during various kinds of parsing. Also, null-terminated strings are unable to represent the zero byte, so in general, for example, you can't use them to store the contents of an arbitrary file, and if you do use them for purposes where null bytes can appear, you have to be careful and handle errors where nulls would implicitly truncate your string.

Just use (pointer, length) strings instead. It'll save you a whole bunch of trouble you may not see coming.

[1] https://code.google.com/p/badvpn/source/browse/#svn%2Ftrunk%... (CAvl_)

[2] https://code.google.com/p/badvpn/source/browse/#svn%2Ftrunk%... (cavl_test_)

I don't see anything inherently broken about null-terminated strings. It means that the smallest string that can be represented is a single byte, and the largest string that can be represented is unlimited. By mixing in an integer, you're expanding the smallest string and placing an arbitrary limit on the largest string. If the integer and string data are mixed in a struct, then strings pack much less tightly, not just because of the space taken up by the integer, but also because it needs to be aligned. Most of the things you'd want to do with a string are O(n) anyway, like print to the terminal. Null-terminated strings aren't perfect for every use, but no representation is, and C doesn't prevent you from using a different representation when it's more appropriate. If you're writing a parser, then that's a more demanding application than usual, and you ought not feel locked into using the default representation.
"For some reason, I don't find myself using resizable array utility code much in C."

Most of the time, the amount of memory you'll need to do something is predictable. When dynamic arrays are always at hand, you lose the habit and end up always reaching for one.

Should you ever need said macros, I've written a set: https://github.com/wrl/wwrl/blob/master/vector.h
Thanks, that's interesting. I would probably add some kind of callback to free individual elements if I ended up using this. A lot of the boilerplate when managing resizable arrays in C ends up being making sure you properly dispose of elements. Or at least it was for me when I last did this.
Absolutely, but that's something I feel that could be implemented on top of these macros if necessary.

Currently I'm using this code just for a string buffers and other arrays where the individual elements are stored in-place in the array.

Sadly some form of OBString is a necessity, any string object needs a reference count and some function pointers packaged in the struct to work. Converting OBString to have the same interface as standard string handling functions is a good idea though, and using the standard functions as a backend would also be done in that case.
> You might also want to look at queue.h from BSD

CPP macros, entirely. No, thanks.

I'm not a fan of macros either, which is why I've ended up implementing hybrid C data structures, with the logic in functions, and some optional type helpers in macros:

https://github.com/pmj/genccont

Anyway, these are largely intrusive and don't do reference counting, but that can be seen as an advantage or disadvantage, depending on the situation. (I use these heavily in kernel code) The hash tables (chaining and open addressed linear probing) use function pointers to be type-generic, which might not be to everyone's taste either.

Well, this is a good effort. I think you are quite right to leave out reference counting. Reference counting is something that you should pay for only if you need it.

I don't really understand the fear of macros. void pointers are much scarier than macros, since they open you up to the threat of bugs going undetected by the type system.

You're not by any chance maintaining one of those much-maligned "driver portability" layers, are you?

For better debugging and diagnostics, I like my macros to do as little as possible. Plus, if the generated code is the same for all types apart from an offset calculation, duplicating it is silly.

If you look at the slist, slist_queue and dlist, you'll find there are typed macros (via a container-of macro) that call down to the generic functions. I might try to extend this to macros that generate a full set of shim inline function for a particular type of list element, if it turns out to be less error prone. You won't see a void pointer being passed around on any of these.

Where function pointers come into play (binary tree, hash tables) I'll admit it's maybe not quite so clear cut as the indirection messes with branch predictors, etc. and the callbacks all take void pointer arguments. Maybe for those containers it would be better to have generator macros that produce function prototypes and implementations.

And no, I'm not doing anything particularly focused on driver portability. I mostly write OS X kexts, and the built-in data structures in xnu aren't terribly helpful. (the OSContainer family uses blocking memory allocations, can only hold refcounted objects and the map and set use linear search) Some kernel code I write is designed to be portable, and I use these data structures for that code, but it's pretty much just a standard factored design - general core and platform specific setup and API use. Nothing with any pretence of being a general solution.

Just as an anecdote, in one of my university classes we had the assignment of writing and optimizing a memory allocator. When I switched from using structs to using pointer arithmetic in macros, I gained a significant performance boost, even though the in-memory data was exactly the same.
were you compiling with optimizations on?
Yes, gcc -O2 on Opteron I believe (it was a while ago).