Hacker News new | ask | show | jobs
by rwg 4291 days ago
Disassembling data structures nicely can take much more time than just tearing them down brutally when the process exits.

A wonderful trend I've noticed in Free/Open Source software lately is proudly claiming that a program is "Valgrind clean." It's a decent indication that the program won't doing anything silly with memory during normal use, like leak it. (There's also a notable upswing in the number of projects using static analyzers on their code and fixing legitimate problems that turn up, which is great, too!)

While you can certainly just let the OS reclaim all of your process's allocated memory at exit time, you're technically (though intentionally) leaking memory. When it becomes too hard to separate the intentional leaks from the unintentional leaks, I'd wager most programmers will just stop looking at the Valgrind reports. (I suppose you could wrap free() calls in "#ifdef DEBUG ... #endif" blocks and only run Valgrind on debug builds, but that seems ugly.)

A more elegant solution is to use an arena/region/zone allocator and place potentially large data structures (like cp's hard link/inode table) entirely in their own arenas. When the time comes to destroy one of these data structures, you can destroy its arena with a single function call instead of walking the data structure and free()ing it piece by piece.

Unfortunately, like a lot of useful plumbing, there isn't a standard API for arena allocators, so actually doing this in a cross-platform way is painful:

• Windows lets you create multiple heaps and allocate/free memory in them (HeapCreate(), HeapDestroy(), HeapAlloc(), HeapFree(), etc.).

• OS X and iOS come with a zone allocator (malloc_create_zone(), malloc_destroy_zone(), malloc_zone_malloc(), malloc_zone_free(), etc.).

• glibc doesn't have a user-facing way to create/destroy arenas (though it uses arenas internally), so you're stuck using a third-party allocator on Linux to get arena support.

• IRIX used to come with an arena allocator (acreate(), adelete(), amalloc(), afree(), etc.), so if you're still developing on an SGI Octane because you can't get enough of that sexy terminal font, you're good to go.

4 comments

Adding some kind of arena-allocation library to both the build & runtime dependencies solely to keep valgrind happy, with no actual improvement in functionality or performance, doesn't seem like a great tradeoff on the software engineering front. I'd rather see work on improving the static analysis. For example if some memory is intended to be freed at program cleanup, Valgrind could have some way of being told, "this is intended to be freed at program cleanup". Inserting an explicit (and redundant) deallocation as the last line of the program just to make the static analyzer happy is a bit perverse.

(That is, assuming that you don't need portability to odd systems that don't actually free memory on process exit.)

I don't see why you assume arenas would be added "solely to keep valgrind happy". Arenas have better performance when allocating a high number of small chunks, because an arena can make better performance trade-offs for this use case than the general malloc allocator.
Yes, we need a standard arena call(s). As you said, glibc already uses them internally. From the malloc(7) man page:

In a multithreaded application in which threads simultaneously allocate and free memory, there could be contention for these mutexes. To scalably handle memory allocation in multithreaded applications, glibc creates additional memory allocation arenas if mutex contention is detected. Each arena is a large region of memory that is internally allocated by the system (using brk(2) or mmap(2)), and managed with its own mutexes.

The "#ifdef DEBUG..." option is pretty much exactly what they did:

http://lists.gnu.org/archive/html/coreutils/2014-08/txtVTwv5...

The proper thing to do is include valgrind.h and use the macros to determine if it's running under valgrind. Alternatively to accommodate other tools, hide it behind an env var. We should make it as easy to test production binaries as possible.
The advantage of the #ifdef DEBUG method is that it's possible to ensure that all of your allocations are clean at the object level, not just at the arena level.

In the limit, you could make one arena for your entire program and then deallocate it on process exit (essentially what the OS gives you by default). Using smaller and smaller arenas lets you see that at a finer and finer granularity. In the other limit you're back to the per-object allocations.

Using an arena to avoid per-object leak checking is good if you're absolutely sure that there are no leaks in that block of code. You're essentially hiding any leaks within the arena from valgrind.

But that seems dangerous, because it makes it harder to use valgrind for what it's good at: finding leaks that you're unaware of, especially in code that you 'know' is good.

The advantage of the #ifdef DEBUG method is that it's possible to ensure that all of your allocations are clean at the object level, not just at the arena level.

One of the points of an arena allocator is to not have to be clean at object level because everything is intended to be freed anyway in the end. For example, one HTTP request or one copy operation fit perfectly for an arena allocator: the system is allowed to "leak" because it's known that the operation won't last that long within the lifetime of the process and all memory related to the operation won't be needed later. (Or cross-operation objects could be allocated from a different pool.)