It does seem rather surprising to me that Windows is so broken that a program can fail to reserve 512MB of virtual address space as the very first thing it does at launch.
Of course, the 2GB limit on 32bit Windows programs also seems terribly broken to me.
Yes it is because PAE doesn't change the address space of processes. PAE let's you use more physical memory, the address space of each process is still the same, because 2^32 is still 2^32 with PAE.
While this is a correct observation, the one about 512Mb contiguous memory being fragmented that early is not really the most clever construction of an operating system I have seen.
It argues that while you can get 2Gb (or 3Gb, ...) you cannot get a contiguous space larger than X megabytes.
That said, it is a rather odd limitation of the garbage collector as well. Most GCs works around the problem by being able to allocate memory in chunks that are different. Still - this solution, one large chunk, is by far the simplest and fastest solution to the problem.
I haven't run windows in a long time, but I do remember from when I did run some servers that there were a lot of configuration options for how the kernel will treat memory allocation for a process. There is a also a big difference between windows versions, for eg. XP is optimized by default for a lot of different applications being opened where it will swap and fragment a foreground process even if it hasn't hit limits as it is anticipating other applications to be opened.
I think that a combination of allocating more to userland, killing all the default services and trimming the server, along with telling the memory manager to treat go as a background process would solve this.
Of course, the 2GB limit on 32bit Windows programs also seems terribly broken to me.