This is demonstrably not true. That address space isn’t allocated until it is used. Even with initial allocations of 4kB pages, there’s plenty more space than you’re estimating.
Again, there's plenty of address space in 48 bits, but you need to commit physical memory in order to allocate the space for the page table for that address space. And that's per-process, and it's going to thrash the TLB.
That's not how the OS's page management tables work. The OS assigns space for the stack at a (typically random) location in the process' address space, but no physical allocation is made. The very first time the stack is used a page fault exception is generated, causing a context switch to the OS. Only then does the memory management subsystem allocate a page for the stack and return control back to the program.
Handling memory allocation lazily like this is necessary to handle a number of edge cases, such as spinning up a massive number of short-lived threads. It also prevents thrashing of the TLB cache.
In practice, real operating systems finely tune their behavior here. I would not be surprised at all if a 4kB allocation is made for a thread's stack upon creation in modern operating systems. But I would be very surprised if, e.g., Linux allocated a full 1MB of memory at thread creation time instead of handling the vast majority of it lazily.
EDIT: Oh wait, I think you were mostly agreeing with me :) Yes, my original comment did mess up address allocation vs physical page allocation due to a brain fart. I meant it the other way around and I think we're saying nearly the same thing.
The one major point of difference is that to make an address allocation in the page table doesn't require a physical allocation. The OS can either leave that allocated space unconfigured, or assign it a protected page table. In either case it faults on access and the OS knows that before killing the process with core exception to first look in its internal lazy delayed-allocation tables to see if it the access was to an allocated area of the address space with deferred allocation.
> The one major point of difference is that to make an address allocation in the page table doesn't require a physical allocation
No it does require a physical allocation. The page table entry for the new virtual memory needs to be a physical allocation!
People say 'you can have 256 TB of virtual memory'! Yes you can, but you will need 256 GB of physical memory to hold the page table for that, won't you, assuming 4 KB pages, even if none of that 256 TB is committed to physical memory.
You say 'that's not how the OS's page management tables work' - yes it is! Look up Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3, Chapter 4, formats of page-directory entry and page-table entry. It's committed memory.
You don't have to allocate a page on the table for an area of memory that is not physically allocated at all. Yes, accessing it will trap to the operating system, but that's the entire point.
Even 4k is a lot, and whatever memory is committed stays committed (and worse, it may even be paged). Perhaps it would be possible for the kernel to uncommit unused stack pages, but I don't think kernels want to assume threads never access memory below their stack pointer.
Oh I agree OS threads a really heavyweight and there is a real need for fibers / green threads. I think we're arguing over whether they are 100x more efficient or 10,000x more efficient. Details matter ;) Thanks for your work on this for Java!
But allocating the address space even if it is not committed still consumes finite resources and still limits the number of threads you can create.