Hacker News new | ask | show | jobs
by aaronbwebber 1481 days ago
I love Go and goroutines, but...

> A newly minted goroutine is given a few kilobytes

a line later

> It is practical to create hundreds of thousands of goroutines in the same address space

So it's not practical to create 100s of Ks of goroutines - it's possible, sure, but because you incur GBs of memory overhead if you are actually creating that many goroutines means that for any practical problem you are going to want to stick to a few thousand goroutines. I can almost guarantee you that you have something better to do with those GBs of memory than store goroutine stacks.

Asking the scheduler to handle scheduling 100s of Ks of goroutines is also not a great idea in my experience either.

5 comments

> So it's not practical to create 100s of Ks of goroutines - it's possible, sure, but because you incur GBs of memory overhead if you are actually creating that many goroutines means that for any practical problem you are going to want to stick to a few thousand goroutines. I can almost guarantee you that you have something better to do with those GBs of memory than store goroutine stacks.

You lost me in a couple places:

1) "GBs of memory overhead" being a lot. A rule of thumb I've seen in a datacenter situation is that (iirc) 1 hyperthread and 6 GiB of RAM are roughly equivalent in cost. (I'm sure it varies over processor/RAM generations, so you should probably check this on your platform rather than take my word for it.) I think most engineers are way too stingy with RAM. It often makes sense to use more of it to reduce CPU, and to just spend it on developer convenience. Additionally, often one goroutine matches up to one incoming or outgoing socket connection (see below). How much RAM are you spending per connection on socket buffers? Probably a lot more than a few kilobytes...

2) The idea that you target a certain number of goroutines. They model some activity, often a connection or request. I don't target a certain number of those; I target filling the machine. (Either the most constrained resource of CPU/RAM/SSD/disk/network if you're the only thing running there, or a decent chunk of it with Kubernetes or whatever, bin-packing to use all dimensions of the machine as best as possible.) Unless the goroutines' work is exclusively CPU-bound, of course, then you want them to match the number of CPUs available, so thousands is too much already.

I agree that GBs for 100Ks of go routines is not in some sense "a lot", in that you might still be using memory pretty effectively. But I don't see that a "6GB vs 1 core" tradeoff makes any sense to talk about.

We have HTTP ingress that needs ~100 cores but could theoretically all fit in 1GB. We have k/v stores that need only 16 cores but would like 500GB. And we have data points at most places in-between. We can't give the ingress 600GB instead, and we can't give the k/v stores 100 cores. So the fact they're financially interchangeable is meaningless for capacity planning.

Arguably, for most code and especially in a GCd language, using less memory and less CPU go hand-in-hand.

If you are in aggregate making good use of all the dimensions of the available machines/VMs, great. I think often people either leave one dimension unused or (when buying their own hardware / selecting a VM shape) could be adding more RAM cheaply.

> Arguably, for most code and especially in a GCd language, using less memory and less CPU go hand-in-hand.

Agreed in general. Even in a non-GC language, less dense data structures means worse CPU cache utilization. But on the other hand, memoization and the like can provide a real trade-off.

In this case, I don't think it's costing much CPU. The GC isn't traversing beyond the bounds of the stack, and it mostly shouldn't end up in the CPU cache either. (Just a partial cache line at the boundary, and some more after a goroutine's stack shrinks or the goroutine exits.)

> I think most engineers are way too stingy with RAM. It often makes sense to use more of it to reduce CPU, and to just spend it on developer convenience.

Hey! That's Java's argument!

Why is spending GB on stack space a bad thing? Ultimately, in a server, you need to store state for each request. Whether that's on the stack or heap, it's still memory that necessarily has to be used.
If you need the stack space then there is no difference. The difference arises because if you preallocate all that stack space using worst case stack sizes and don't use most of it, you've wasted lots of memory.

Also there is a ton of nuance here like overcommitted pages and large address spaces which mitigate some of those downsides.

Expect, Go doesn't do that. It grows stacks as you use them and shrinks them if you stop using so much. So the overhead should be limited. FWIW, heap allocations also come with memory overhead.
Right, yes, go does have some nice ways of handling this problem. I was speaking more generally about using that much stack space vs heap space for threads in general - but I should have realized that this specific thread was more about go, and perhaps my comment wasn't as useful in that context.
Despite popular belief, not everything is a (web) server. I can imagine many threads to be appealing in e.g. simulations.
I definitely can’t hire anyone in this thread to work on cell phone performance. We fight for 10 KBs of memory and yes, we are still doing this in 2022.

Even on a server, you may have TBs of RAM but you don’t have that much L1 cache nor that much memory bandwidth.

Why would you need hundreds of thousands or millions of goroutines for a cell phone app/daemon?

I would expect the number (and corresponding memory usage) to therefore be low.

The only numbers in programming are one, two, and many.

So if you’re not very careful and nothing stops you, it’s pretty easy to create an unbounded amount of anything.

By that logic, allowing more than 2 byte allocations is a mistake.
Sure, but my point is that if you want to run 100k-1m things concurrently, you need to store state for them.

That has a memory cost no matter what.

If you asked me what “a few kb” times “hundreds of thousands” is, I’d have characterized it as “more than a few hundreds of thousands of kb”, not necessarily “gigabytes,” and that doesn’t sound impractical at all. My JVM heaps are usually 16GB.

And go actually does a pretty good job of scheduling hundreds of thousands of threads. 6 months ago I did some fairly abusive high-thread-count experiments solving problems in silly ways that required all N goroutines to participate and I didn’t see much perf falloff on my laptop until I got 1.5-2 million goroutines.

In addition to the other comments about memory usage, I’ll mention that there is a proposal (that’s either going to make it into Go 1.19 or 1.20?) that uses heuristics to determine a good starting stack size for goroutines.
My experience is that whatever you’re doing with the go routine is usually a bottleneck before the go routine itself. E.g. if you make a network request, you become network bound before memory bound from go routines.