Hacker News new | ask | show | jobs
by menaerus 1351 days ago
> despite freeing more memory than we allocate

> despite DuckDB freeing more buffers than it is allocating

Can you please clarify how is that even possible?

1 comments

We are allocating and freeing buffers repeatedly. Despite freeing more buffers than we allocate, memory usage might still increase because of internal fragmentation in the allocator. Essentially, fragmentation might create "unused" space that does take up space. This phenomenon is called heap fragmentation [1].

[1] https://cpp4arduino.com/2018/11/06/what-is-heap-fragmentatio...

> Despite freeing more buffers than we allocate

Technically, I hope you understand that this isn't possible but maybe I am misinterpreting what you're trying to say.

  auto buff = malloc(N);
  free(buff);
  free(buff);
is one way to free "more" buffers than allocated but this will lead to an UB and depending on the underlying system allocator implementation it may or may not crash.

However, given how silly this would be I believe this is not what you're trying to convey?

Here's what mytherin wrote, ...we are allocating and freeing buffers repeatedly. Despite freeing more buffers than we allocate...

So, I assume, the context is, DuckDB allocates x buffers, frees x - m buffers at some point later, then allocates n buffers where n <<<< m, and yet malloc fails.

In the GitHub thread mytherin linked to above, Alexey Milovidov, ClickHouse CTO, points out that ClickHouse uses jemalloc and makes for a better choice than glibc malloc given the issue with fragmentation. It is likely that DuckDB switches to jemalloc, too.

You are misinterpreting it indeed.

The scenario I am describing is roughly the following:

Suppose we allocate 100K buffers that all have an equal size, and our memory usage is now 10GB. After that point we free 20K buffers, but allocate 10K more. In other words, from that point on we are freeing more buffers than we are allocating.

Now, since we are freeing more than we are allocating, you would expect our memory usage to go down. However, when using the standard glibc malloc on Linux, our memory usage unexpectedly goes up. After this happens several times in a row the system runs out of memory and new calls to malloc fails.