But wouldn't the consequence of spawning millions of native threads be to spill those virtual addresses into many, many pages and incur many more TLB misses and page faults generally?
Only the portions of the stack that are actually touched trigger a TLB miss and page fault. If you've got a million threads but each only touches the first 4K of stack space, you end up only touching 4G of RAM even though you've mapped 1T of it.