Hacker News new | ask | show | jobs
by arielweisberg 4415 days ago
I think that thread scheduling is really not the high pole in the tent for large numbers of threads. It's stacks. If you want to have a million stacks and each is allocated to the highest watermark that thread ever reached you will run out of memory.

I suspect that task size has to be smaller than the typical bit of web processing code for context switching overhead to really dominate, or even be expensive enough to matter. That says as much about how expensive common tools and frameworks are as it does about the cost of context switching.

1 comments

That is kind of the issue in Go at the moment. They have been oscillating between default stack sizes and also switched from green threads to real OS threads backing goroutines.

EDIT: scratch the last statement about Go using OS threads for goroutines. I was thinking of something else.

Sometimes on Linux systems, even allocating a large stack size doesn't actually consume that as physical memory. Malloc might give you the memory block, but until you actually make function calls or allocate data on the stack, it might not consume the memory.

Someone should really compile a list of all the non-toy languages/environments that started with green threads and switched to native threads, with explanations of what they were trying to do and what happened and why they switched. It seems to be a popular path. Before the next person thinks "oh, this would be so simpler and more problem-free if I use green threads", that person should really review the prior art, heh.
Usually, when people say "green threads" they mean 1:N scheduling (i.e. the application consumes one kernel threads, and schedules N threads on top of it). Lightweight threads, on the other hand, employ M:N scheduling, i.e. they use multiple kernel threads and schedule even more fibers on top. That's the approach taken by Erlang, Go, and Quasar. And Erlang has done this well for a long time.

Now, fibers are not perfect: their main shortcoming is integration with legacy code. The optimal approach is scheduler activations, or "user assisted" kernel scheduling.

On the one hand, I've observed that operating system implementers have repeatedly tried and rejected green threads for their pthread implementations (Solaris many-to-many threads and FreeBSD KSEs are both now historical footnotes). In Linux around 2002, there was a new many-to-many pthread implementation called NGPT that was backed by big players like IBM and Intel, until a couple of Red Hat developers (Ulrich Drepper and Ingo Molnar) obsoleted it with NPTL and some accompanying kernel optimizations. So at the OS level, a 1-to-1 correspondence between kernel-level and user-level threads seems to be what mature implementations settle on. Also, early JVM implementations for Unix used green threads, but that's also now a historical footnote.

On the other hand, Go does use green threads, and the main developers of Go are by no means naive. So maybe green threads do have some benefit, but only when implemented in something higher-level than libc and pthread.

It seems like Go uses it's own threads but backed by OS threads, rather than actually pure green threads. Yes?

But yeah, a whole lot of people seem to think "man, OS threads suck, surely we can do better with green threads," only to find out why OS threads suck, and that many developer-years of work have gone into making them not suck more.

OS threads by no means suck, but they can't make certain assumptions that lightweight threads can about thread behavior. They're great for a lot of processing, but not so great when they have to constantly block and unblock.
IIRC, SQL Server was their first product to use this.
Relevant HN thread:

https://news.ycombinator.com/item?id=6977177

BTW, I wonder if Paul Turner's work on fibers will change this equation if/when it makes it into the production kernel.

https://www.youtube.com/watch?v=KXuZi9aeGTw

Google has had very promising initial results with it.

I've been wondering about this as well. Goroutines start with an 8KB stack size now; native threads seem to use an 8MB _virtual_ stack size of which often only one or two pages (of 4KB) will be used. So is the primary remaining advantage of goroutines vs threads the efficiency of the go scheduler?
FWIW, musl libc uses a much smaller default thread stack size (80KB). But apparently, Rich Felker wasn't able to find any good data on how big thread stacks actually need to be (see http://git.musl-libc.org/cgit/musl/commit/?id=13b3645c46518e...).

EDIT: So I guess we should run more big multi-threaded server apps with musl and see what breaks.

When did Go switch from green threads to real OS threads backing goroutines, and why? Can you at least point to a mailing list post or version control commit?
Sorry apparently that is not the case, I am an idiot, I was thinking of something else.
Deferred allocation of stack pages only buys you so much. Each thread is going to burst usage causing stack space to be committed and once it is committed the memory is gone until the thread is reclaimed.

If you are in a million thread scenario it is usually because most of them are idle and retaining state for some eventual activity. That also tends to mean that they are long lived.

This applies to both goroutines & native threads, right? I guess with both you could periodically re-shrink the stack / release memory back to the OS, but I don't think either go or any native threading system I'm aware of does that.

It would be interesting to do this for native threads at the kernel level: it's very similar to swapping, except for stack data you could just throw the data away if you could figure out that it was indeed dead.

go is a moving target, but I believe it is committed to having segmented stacks that shrink.

Native threads don't have a way to reclaim stack memory other then swap. My personal opinion is that this should be solved in the kernel and that kernel threads should have an option for reclaiming stack space so I can run as many as I want.