The large memory footprint of the JVM is memory for classes, profiles, things like that. Those are used to create optimised code and to recover when optimisations were too optimistic. When your program is optimised and running in steady state, this memory isn't actively used and so doesn't contend with your application memory and so has no impact on cache efficiency.
This sounds like a plausible explanation, but is this verified/verifiable? Are there memory profilers that can show me the relative sizes of the young/old/permanent generation segments of the GC?
I'm always blown away at the memory usage of JVM apps. Part of it is the fact that java has encouraged insanity-inducing inheritance hierarchies...but also it is incredibly hard to do dead code optimization on for such a static (type and compilation model) language (I blame dymanic classloading, but that's more of a guess than anything). Maybe what you're saying is the reason we don't see noticable GC pauses until you start seeing large amounts of data...but it is still a huge pain for low memory environments like phones, embedded devices, IoT, etc. And while memory usage is always gonna be higher on a GC'd language, the JVM still consumes vastly more memory than other languages like OCaml, D, Go, etc.
Yes, it's called "perm gen cache," or something like that, on any standard JVM profile. This roughly represents the memory used by the type system. It can get pretty high if you are doing something like auto-generating types (GUI, build systems, etc)