Hacker News new | ask | show | jobs
by jasode 3535 days ago
>I can't help but notice that C++ community has a strong bias against garbage collection. [...]

>Why can't you acknowledge that there are problems that have GC as the only and best solution?

Your prelude and the followup question is not well-formed.

C++ programmers do not have a bias against GC as a specific problem-solving technique. In fact, expert C++ programmers can embrace GC so much that they can write an entire virtual machine[1] with GC and a DSL[2] for that vm that takes advantage of memory safety. Both the CLR vm and the (original) C# compiler were written by C++ programmers.

What the C++ community doesn't want is GC in the C++ base language itself or the standard runtime. That's a very different concept from a generalized "C++ bias against GC".

In other words, the following is unacceptable:

  std::string x = "Hello, " + fullname; // cpu cycles spent on GC
Those cpu cycles spent on constantly checking if "x" is no longer reachable is cpu power that's taken away from rendering frames of a 60fps game, or computing numeric equations or high speed quantitative trading. C++ programmers don't want GC as a global runtime that you can't opt out of. Also, global GC often requires 2x-3x the memory footprint of working memory which is extremely wasteful for the resource constrained domains that C++ is often used in.

Herb Sutter's presentation is compatible with "pseudo-GC-when-you-need-it" without adding GC to the entire C++ standard runtime.

[1]https://en.wikipedia.org/wiki/Common_Language_Runtime

[2]https://en.wikipedia.org/wiki/C_Sharp_(programming_language)

6 comments

I agree with you, but in this

> std::string x = "Hello, " + fullname; // cpu cycles spent on GC

you are already spending cycles on memory management (if C++ allocates character data on the heap which I think it does). You are searching for free space in the heap, possibly synchronising to do that, and so on.

With a GC you may even use less cycles here! For example a copying GC could mean that you can allocate with a simple thread local bump pointer.

So in your statement you are already paying an unknown cycle cost for memory management. Why do you care if it's GC?

Your answer is probably the variance in the number of cycles - the noticeable pauses - which is a reasonable concern.

Yes, but in this case we know when the allocations will occur, and when they will be freed. If using a GC we know when they will occur, but do not know when they will be freed. Which means that at some indeterminate point in the future there will be a large temporary slowdown due to processing the GC.

This is one of the bigger reasons people use C++ and even techniques within it to explicitly collect such items at a known point in time. (Techniques such as marking items as dead in an array but still keeping them in there until the end of frame, etc)

> If using a GC we know when they will occur, but do not know when they will be freed. Which means that at some indeterminate point in the future there will be a large temporary slowdown due to processing the GC.

This just isn't true anymore. Incremental collectors can achieve pause times in the single-digit millisecond range, and concurrent collectors can achieve pause times in the single-digit to tens of microseconds range, even for super-high-allocation rate programs. There are even real-time collectors suitable for audio and other hard real-time applications.

Azul GC (a high performance concurrent compacting collector): https://www.azul.com/products/zing/pgc/

Metronome (a real-time GC with guaranteed maximum pause times): http://researcher.watson.ibm.com/researcher/view_group_subpa...

GC scheduling in V8 (hides GC pause times between frames, reducing animation jank): http://queue.acm.org/detail.cfm?id=2977741

The Go GC ships now with a very low-latency GC: https://blog.golang.org/go15gc

> This just isn't true anymore. Incremental collectors can achieve pause times in the single-digit millisecond range

Single digit milliseconds is millions of instructions, which /is/ a large slowdown in some applications.

How much do you think a page fault costs?
Page faults cost zero. On locked pages.
A millisecond is a huge amount of time, and in that time we can do so many more things more useful to the application than just collecting its trash.
Again downvotes. It'd be nice if people actually replied instead of downvoting.
This is the advantage of C++ for me. Deterministic behaviour.
But is it deterministic? Will the allocator always have a deterministic amount of work to do to find enough free space for your string characters? I'm not sure that's the case.
Which allocator? For those who care, they replace the allocator.
You can deactivate GC momentarily and reclaim memory when you want (end of frame, every ten frames, ...). Most of the time you can manage to write your code to minimize allocation, or make sure memory is allocated on the stack. Depending on your parameters and a little bit of profiling, you can manage to have a stable usage of memory over time and a bounded GC time.
malloc/free/etc. will only cost when they're called; GC can be a continuous expense even when there's no collection.
It is a matter how it is implemented.
Most GCs will not cost you a single cycle if you never allocate.
This is technically correct (though in most gc's, if you allocate and keep a single byte, you pay for it with various barriers, etc, forever) but then, because they have good GC's that are like this, almost every GC language used allocates all the time.

So it would be more accurate to say "Most GC's will not cost you a single cycle if you and the underlying language runtime and execution environment do not allocate".

IE your statement, while true in theory, makes literally no practical difference if allocations happen all the time without you knowing or controlling it.

But in most GC languages there is nothing you can do without allocating. Creating an object is already allocating it on the heap, printing a string will also allocate.
Not in GC languages that also have value types.

For example you can do all of that in Modula-3, Active Oberon, Component Pascal and many others without allocating more than in C for example.

Mixing all GC enabled languages in the same box is a mistake.

I don't know why they downvote you. Even in Java with no value types yet, there are ways to write useful code with no or almost no managed heap allocation. And if you don't need ultra low pauses, mixing these techniques, e.g. using managed heap for 80% of non-performance critical stuff and using careful manual dynamic allocation for the rest 20% (e.g. large data buffers) typically gets you all good things at once: high throughput, low pauses, almost no GC activity when it matters and convenience of writing most of the code.
I just recently wrote my own memory allocator for my own String. Now my String is at least 2x faster then the next fastest alternative (I have tried many alternatives for C++ strings including std::string of course). String allocation can be made very fast with thread local memory pools (and you just need a basic GC to free up memory if there are a lot of strings allocated in one thread but destroyed on another one).
It is very likely that in this particular case small string optimization would allocate data on stack and no heap allocations would take place.
> For example a copying GC could mean that you can allocate with a simple thread local bump pointer.

This is equally true for explicit memory allocation. The point is that on some allocations under GC, it will have to collect garbage. And collecting garbage will tend to be more expensive than explicit frees, because it usually has to do work to discover what is garbage.

There is an advantage to universal GC: it can simplify interfaces.

For example, in C every time a `const char *` appears in an API there has to be an implicit contract about who owns the string and how to free it. A language like Rust improves on this by enforcing such contracts, but the contracts still complicate every interface.

In a GC'd language you can just forget about these issues and just treat strings (and other immutable objects) almost as if they were ordinary fixed-sized values.

And C++ is all about choice trade-offs.

Put very simply: do your choice for your problem and deal with its consequences.

Nobody says GC has no advantages, the claim is that its disadvantages don't always offset them.
The way C/C++ get around the implicit ownership of const char * is everyone copies data around.
"Also, global GC often requires 2x-3x the memory footprint of working memory which is extremely wasteful for the resource constrained domains that C++ is often used in."

Memory fragmentation in manual memory management often comes at a similar or higher cost, which is often "forgotten" and neglected, because it is hidden:

https://bugzilla.mozilla.org/show_bug.cgi?id=666058#c31

And fragmentation may grow over time much more than the typical GC overhead. Then you have to do a "forced GC" that is restarting the application. Often happens to my phone or browser. I'm not sure if it is more due to fragmentation or more due too plain old memory leaks.

> Your prelude and the followup question is not well-formed.

Thank you for your feedback, I appreciate it.

> C++ programmers do not have a bias against GC as a specific problem-solving technique. Expert C++ programmers can embrace GC

Please watch this portion of the video, and the way the speaker is pronouncing the word "collect":

https://www.youtube.com/watch?v=JfmTagWcqoE#t=1h5m25s

Regarding general-purpose GC:

I don't support using GC for everything. If there is a problem that can only be solved with GC, it doesn't mean that we should express our entire running programs in terms of dynamic cyclic graphs. I believe we can do much better, with much fewer resources.

>Please watch this portion of the video, and the way the speaker is pronouncing the word "collect":

Herb was deliberately using circumlocution to avoid the phrase "garbage collection" so as to not poison the well. The latter part of the video explains his reasoning for presenting it that way:

https://www.youtube.com/watch?v=JfmTagWcqoE&t=1h32m55s

>I don't support using GC for everything. [...] I believe we can do much better, with much fewer resources.

This is a statement that sounds reasonable and balanced . How can anyone possibly disagree with it?!? The issue is that it's very difficult to take that universal wisdom and construct a real programming language that satisfies all the following properties seamlessly and elegantly:

1) language that has optional GC

2) has zero cost when not using it. This means no active tracing loop that checks object graphs and no large memory blocks reserved to allow reallocation and defragmention.

3) transparent syntax that's equally elegant for manual memory or GC in the same language

To make your GC-when-appropriate advice real, one has to show a real implementation. E.g. one can fork the GCC or LLVM compilers and add GC to it. Or fork the Java OpenJDK compiler and show how syntax can remove the GC on demand. Or construct a new language from scratch. There may be some lessons learned from the D Language backing away from GC. Also, Apple Objective-C retreated from GC as well.

> To make your GC-when-appropriate advice real, one has to show a real implmentation.

I haven't tested it closely, but I believe that Rust with this crate: https://github.com/Manishearth/rust-gc is an implementation for "GC only when you need it".

Rust would be a perfect GC-optional language as, having per-thread heaps, GC using threads can be completely segregated from the non-using ones.
Microsoft's Managed C++ is frankly scary, but its gc root type seems pretty close to this.
Right, but they pay a different cost there that isn't listed: "It's not entirely compatible with the existing language"

private and multiple inheritance do not work, for example (yes, i know some of this is fixable, but some is not)

  public __gc class one { int i; };
  public __gc class two: private one { int h; i = h; }; //error

  __gc class a {};
  __gc class b {};
  __gc class c: public a, public b {}; //will produce an error
etc
Which one (or more) of those three properties does the D language fail to satisfy?
Another aspect against GC is system determinism. Unmanaged runtime is more deterministic and is often faivored in real time applications.
Only if malloc()/free() don't need to call OS APIs, if it happens good luck with being deterministic.
At least with the real time applications that I have experience with, this is often mitigated with custom memory allocators which pre-allocate large contiguous blocks before performance critical sections and then parcels out segments of those blocks at runtime.
Which is also possible in GC enabled languages, so one can make use of a GC and then make use of similar technics for the code path that really requires them.

For example in real time JVMs

One does not need to ever use malloc()/free() or new/delete in C++. For instance, it is very common to use just the DATA and BSS segments in embedded code, and rely on placement-new, which is always deterministic.

Dynamic memory allocation is also a choice in C++.

Going a bit off topic, that is something you can also do in Ada or SPARK, while having a more type safe language.

My problem when I used to code in C++ was working with others in corporate projects.

It didn't matter how I strived to write modern code, making use of best practices to write safe code, there were always team members that never moved beyond C with Classes and nevermind having some processes in place to avoid precisely that.

Even nowadays, I get the feeling most developers don't make any use of static analysis.

Sadly, I don't think that's a problem that a language alone can solve. Ada, for instance, is awesome with its runtime enforcement of design contracts, but all a programmer needs to do is change the contract and chaos ensues.

I'm a huge fan of strong typing. I'm also actively trying to find ways to improve static analysis and formal methods. But, if you can't trust your developers, it all eventually breaks down.

I find that for most mature projects, a good developer needs 3-6 months of ramp-up time, which should include knowledge transfer, training, and restricted commit access. The point of this isn't to haze the developer, but instead to give him/her a chance to fully grok the intent of the mechanisms in the code base and to (hopefully) present and socialize better options in a controlled way. More and more, I've come to the conclusion that a strong team mechanic is one of the mandatory components of good software. DBC, code reviews, unit testing, formal analysis, and static analysis all help to reinforce this mechanism, but if the tribal knowledge of the team breaks down, then so ultimately will the quality of the software produced.

There are real-time GCs that depend on using uniform blocks of memory. All real-time allocators are going to involve similar pain.
They are not free, though, as they trade throughput for a guaranteed latency upper bound.

edit: clarify

A GC API definition is already in the standard. It was added in C++11.