Hacker News new | ask | show | jobs
by charriu 3946 days ago
I personally never understood why java has built in types vs. reference types. Why should I care, instead of just letting the compiler handle that for me?
5 comments

  Why should I care
Let's say you were processing map data for OpenStreetMap, which has about 3,720,000,000 nodes [1], and for each node you want to store a latitude and a longitude.

If you use a pair of primitive ints for the latitude and longitude you'll need 8 bytes of memory per node, or 30 gigabytes total.

(Ignoring caching and assuming UseCompressedOops is enabled) Each boxed int will take 4 bytes for the primitive and 8 bytes of housekeeping overhead and class reference, rounded up to 16 bytes because memory boundaries must be divisible by 8, plus you'll need a reference to that object which is another 4 bytes.

So replacing those two primitive ints with boxed ints will take 40 bytes of memory per node, or 150 gigabytes total. And now you've got a 150 gigabyte heap to garbage collect :)

That memory increase is the difference between an EC2 instance that costs $1600/year and one that costs $13000 - and primitives will probably perform better as well.

And if you want to store more than two ints for each node, it could be the difference between an off-the-shelf EC2 server and needing to get hold of special high memory servers.

[1] https://www.openstreetmap.org/node/3720000000

Keep in mind that Java came out about 5 years before a certain similar language that has reference types. The designers of that language made some decisions based on trying to learn from things that didn't work out so well in Java.

To some extent both languages were trying to deal with C++'s having three different kinds of types: Atomics, objects and structs. If I had to tell a made-up story, I'd guess that Java's designers looked at that and decided structs were in many ways just a more limited version of classes with some confusing semantic quirks, and therefore decided to nix them. That left atomics and objects, and the rest is history.

C#'s folks then reconsidered it and realized that, from a high level perspective, atomics are really just a special case of structs. (Side note: still telling that made-up story.) So they kept structs, unified them with atomics in the language,* and let the compiler deal with the rest. It works well and seems like an obviously better way to do it, but I'm not sure that insight would have been so obvious in the moment.

* Edit: Oh, and made structs a subtype of System.Object. Unlike Java, .NET's more-or-less fully object-oriented.

C++ doesn't really make a difference between builtin and user-defined data types. They all have value semantics, and a lot of effort has been spent to make user-defined types able to mimic builtin types as closely as possible. Classes and structs have no semantic differences besides default visibility of member names. For reference semantics, you use explicit references.

Java wanted to enforce reference semantics (and heap storage) for classes, but still wanted to retain value semantics (and stack storage) with fundamental types for performance reasons. The result is the mess we have now.

Many languages older than Java do the said conversion between value and reference types.

They just wanted to simplify the compiler.

When Java came out into the open, I had 16 MB of RAM in my 486 and that was more than most of my friends. A long take 4 bytes allocated on the stack so no GC overhead. A pentium 133 was bleeding edge and 64MB of RAM was Unix workstation territory.

In that backdrop, seemingly bad trade off get made for execution performance and to compile the code in a reasonable amount of time. More accurate would've been that the decision has not aged very well.

I think a lot of it was for performance concerns. For math-heavy operations, the dealing with lots of objects caused quite a bit of garbage collection overhead. This is much less important now—they've optimized garbage collection to the point that it is often irrelevant—but it's still a concern in math-heavy applications.

Here's an article that goes into it in more detail: http://www.javaworld.com/article/2150208/java-language/a-cas...

Because 1995.