| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by def-lkb 2251 days ago

That might seem counter-intuitive, but tracing GC is the fastest way to manage memory in general (arbitrary life time, graph shapes and allocation patterns).

It permits really fast allocation, defragmentation and is sometime the only way to manage memory (cyclic datastructures for which reachability cannot be known directly from program text).

Reference-counted GC on the other hand is notoriously slow, incomplete, and exhibit lots of bad performance patterns (that can be somehow managed by taking some elements of tracing GCs).

The drawback is that tracing GCs are very hard to implement, to tune, and sometimes not all of the benefits are available at the same time. And because they are so counter intuitive, programmers in general have a hard time understanding their runtime behavior... Even GC experts struggle (the skills required range from low-level hardware to control theory, otherwise performance at scale is unpredictable).

1 comments

Rochus 2251 days ago

In C/C++ we have the possibility to avoid dynamic allocation altogether and to use system features like memory mapping. If we use C++ the same way as Java (everything dynamically) it's not too surprising the result is not (much) faster than Java.

link

aidenn0 2251 days ago

Their original implementation was in common lisp. Common Lisp also lets you use mmap (in fact it's not that uncommon to do so if you have a large amount of mostly static data) to manage your memory manually.

They clearly wanted automatic memory management, so the C++ implementation is reasonable. A fairer comparison might have used MPS or boehm instead of refcounting, but I suspect the results would have been similar.

link

nemetroid 2251 days ago

> the C++ implementation is reasonable

Fig. 4 shows that deallocation, presumably of objects allocated during the first phase, takes half as long as the first phase itself. Unless the nature of the problem solved by the program requires you to have a ton of objects with shared mutability (the use case for shared_ptr), and requires you to make a lot of allocations up-front without knowing if you will need them, the memory thrashing occurring does not seem reasonable.

link

Rochus 2251 days ago

> Common Lisp also lets you use mmap

But not without allocating dynamic memory and copying data.

> They clearly wanted automatic memory management

Most likely because of some misconceptions.

> so the C++ implementation is reasonable.

How so?

> but I suspect the results would have been similar

Don't forget the data sets to be filtered, sorted an analyzed are up to 200 GB.

link

aidenn0 2251 days ago

> > Common Lisp also lets you use mmap

> But not without allocating dynamic memory and copying data.

Sure it does. In SBCL you can force a stack allocation (though rarely does it improve performance), and very short-lived values do not leave registers in any case.

> > They clearly wanted automatic memory management

> Most likely because of some misconceptions.

There are both good and bad reasons to want automatic memory management. At least one good reason it it would decrease the porting effort by keeping the code similar.

> > so the C++ implementation is reasonable.

> How so?

Using reference counting is a reasonable way to get automatic memory management in C++

> > but I suspect the results would have been similar

> Don't forget the data sets to be filtered, sorted an analyzed are up to 200 GB.

Which is going to be rough on any automatic memory management system, which makes using a language with a better ecosystem of automatic memory management more performant.

link

candiodari 2251 days ago

> > Don't forget the data sets to be filtered, sorted an analyzed are up to 200 GB.

> Which is going to be rough on any automatic memory management system

You could equally say that it makes the case for actually designing memory allocation strategy (which only C++ really supports) that much more important.

You always see this in Java programs for large data analysis. They pick java because of memory management and the tooling. But it's just SO slow and after optimisation the only thing that stubbornly remains up there in the profiler data is memory and GC. And what do they do?

A global object of the following form:

  class DataStore {
    float theFloatsWeNeed[constHowMany];
    int theIntsWeNeed[anotherConst];
  }

You get the idea. Because this avoids memory allocation in java. And you use the flyweight pattern to pass data around. Or you fake pointer arithmetic in java. You create your own pointers by specifying indexes and you oh the horror use math on those indexes. Even then just checking those indexes actually becomes a significant time sink (and then you disable that, which of course kills memory safety in java, but you won't care).

The truth is you don't want memory management for large amounts of data. You don't want to allocate it, track it or deallocate it at all. You leave it in it's on-disk data format and never serialize/deserialize it at all. You want to mmap it into your program, operate on it and then just close the mmap when you're done. C++ definitely has the best tools for this way of working.

link

Rochus 2251 days ago

Yeah, right. The people who know this can apparently be counted on one hand when I look at the advertised publication and the discussion in this forum.

link

Rochus 2251 days ago

Common Lisp: It is permissible for an implementation to simply ignore such declarations. And you still have to copy.

Ref counting: only makes sense in a few special cases.

Avoiding dynamic memory management: have a look at mmap.

link

aidenn0 2251 days ago

C++: It is permissible for an implementation to allocate every local variable on the heap.

Going back to my original point, you are suggesting a complete rearchitecture of their allocation system. That does not require switching languages to C++. If we are talking about working with 100s of GB of data, that's probably even the correct approach!

TFA does not, however, claim that they have a working set of 100s of GB of data. The data is 100s of GB at rest, but can be processed in chunks with a single pass. That, by itself, does not scream "mmap" to me. On top of that, the data is compressed at rest, so copying is inevitable.

link

kazinator 2250 days ago

It's permissible for a C compiler to emit shell scripts.

link