Hacker News new | ask | show | jobs
by avodonosov 2539 days ago
OS threads use statically allocated stack. This limits the nubmer of threads one can have (say if stack is 1MB and you have 1GB of memory, you can have 1024 threads max). That's the main difference. I think some savings also come from avoiding full blown context switches.
2 comments

You only lose address space as long as that stack memory isn't used, right?
Good point, yes. As mentioned in a sibling comment, even 4k for one page of really allocated memory is significant.
This is demonstrably not true. That address space isn’t allocated until it is used. Even with initial allocations of 4kB pages, there’s plenty more space than you’re estimating.
No you have it backwards - the address space is always allocated. The physical memory is not allocated (we say committed) until it is used.

But allocating the address space even if it is not committed still consumes finite resources and still limits the number of threads you can create.

Yeah, but there’s plenty of space to put 2MB stacks in 48 bits (that’s 256 TBs) of address space.
Again, there's plenty of address space in 48 bits, but you need to commit physical memory in order to allocate the space for the page table for that address space. And that's per-process, and it's going to thrash the TLB.
That's not how the OS's page management tables work. The OS assigns space for the stack at a (typically random) location in the process' address space, but no physical allocation is made. The very first time the stack is used a page fault exception is generated, causing a context switch to the OS. Only then does the memory management subsystem allocate a page for the stack and return control back to the program.

Handling memory allocation lazily like this is necessary to handle a number of edge cases, such as spinning up a massive number of short-lived threads. It also prevents thrashing of the TLB cache.

In practice, real operating systems finely tune their behavior here. I would not be surprised at all if a 4kB allocation is made for a thread's stack upon creation in modern operating systems. But I would be very surprised if, e.g., Linux allocated a full 1MB of memory at thread creation time instead of handling the vast majority of it lazily.

EDIT: Oh wait, I think you were mostly agreeing with me :) Yes, my original comment did mess up address allocation vs physical page allocation due to a brain fart. I meant it the other way around and I think we're saying nearly the same thing.

The one major point of difference is that to make an address allocation in the page table doesn't require a physical allocation. The OS can either leave that allocated space unconfigured, or assign it a protected page table. In either case it faults on access and the OS knows that before killing the process with core exception to first look in its internal lazy delayed-allocation tables to see if it the access was to an allocated area of the address space with deferred allocation.

> The one major point of difference is that to make an address allocation in the page table doesn't require a physical allocation

No it does require a physical allocation. The page table entry for the new virtual memory needs to be a physical allocation!

People say 'you can have 256 TB of virtual memory'! Yes you can, but you will need 256 GB of physical memory to hold the page table for that, won't you, assuming 4 KB pages, even if none of that 256 TB is committed to physical memory.

You say 'that's not how the OS's page management tables work' - yes it is! Look up Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3, Chapter 4, formats of page-directory entry and page-table entry. It's committed memory.

These are good points, thank you.

BTW, I studied a little deeper, Java actually commits the stack memory upon thread creation.

The memory distribution of java process may be seen using jcmd <pid> VM.native_memory as suggested here http://xmlandmore.blogspot.com/2014/09/jdk-8-thread-stack-si...

But due to Linux memory overcommit this memory is not really allocated.

I've managed to create 127000 Java threads on my machine, and the resident memory of this process, as shown by `top` is 2,167G ~ 17k per thread.

If I disable memory overcommit, only around 32000 threads are created.

Even 4k is a lot, and whatever memory is committed stays committed (and worse, it may even be paged). Perhaps it would be possible for the kernel to uncommit unused stack pages, but I don't think kernels want to assume threads never access memory below their stack pointer.
Oh I agree OS threads a really heavyweight and there is a real need for fibers / green threads. I think we're arguing over whether they are 100x more efficient or 10,000x more efficient. Details matter ;) Thanks for your work on this for Java!
I think you severely overestimate the efficiency gain both times.