2KB is tiny, that must be a default which grows or is extended as required? Obviously it works out okay for them, but I feel like it would add nontrivial overhead to every function call if you're always having to check whether or not you need to grow your stack to accommodate the new frame.
Linux does not immediately map 2MB of stack for every thread. That would be ridiculous. Look in proc/smaps to see how much space your stacks actually occupy.