| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chancho 5319 days ago
	I just about spat my coffee out when I saw that nio bytebuffers used 32 bit ints for everything. (I'm not normally a java guy.) I thought "oh hey a direct byte buffer will be a great way to keep all this data from blowing up the heap AW WTF!!?? ints?!?" Does anyone know the rationale for this? If they had used 64 bit long values, like the underlying OS calls, his whole matrix could have been mapped into a single buffer, making all this list-of-mappings stuff unnecessary. That extra level of indirection normally wouldn't matter much but in this case he's paying the cost 1e12 times over.

3 comments

beagle3 5319 days ago

The 32-bit ints can be solved even at the user-level library. But it's much worse than that.

Even if you only need to access 2GB (or you had fixed the Java memory mapping code) you still have a .getDoucle() or .putDouble() call for every access; and that's actually a virtual call (and as far as I can tell, even though I only ever used one kind of memory channel, the JVM wouldn't inline it -- although I can't tell for sure, because the JVM also sucks at introspection).

I had real computational code in C that needed to be translated to Java.

First attempt (no memory mapping, converting C structs to Java objects) failed miserably because my structs were 32 bytes each, and the object overhead was 24 or 32 (don't remember), which took me beyond physical memory (using virtual memory caused a slowdown of ~1000).

2nd attempt, I switched to memory mapped arrays -- much better, only ~15 times slower. But I also had to write my own sort, because Array.sort() or whatever it was called was allocating 48 bytes for each 4 byte int to sort (wtf?), blowing memory usage up again.

That's a cost people using Hadoop pay all the time -- which kind of surprises me how popular it is. You need 10 times less CPU if you do things right -- and at that scale, maintenance & hardware cost as much as salaries....

link

Scaevolus 5319 days ago

Arrays.sort() creates a full copy of the input data before sorting.

link

beagle3 5319 days ago

It was copying more, or for some reason expending from ints to Integers -- it multiplied the required memory by 12.

I don't have access to that source code anymore, and I don't remember what exactly I used, but -- given that I had to implement my own data structure over mmap -- it was an array of int, which needed sorting through a comparator class I supplied. That comparator looked up the structs corresponding to ints, and compared them. Perhaps it was just crazily instantiating the comparator class or something.

link

SeanLuke 5319 days ago

I see nothing in the Java6 Arrays.java source code which would support this claim.

link

Scaevolus 5319 days ago

Oops, I got Arrays.sort confused with Collections.sort

link

eternalban 5319 days ago

You can use sun.misc.Unsafe [1] and get 64 bit (long) addressing. It is JVM dependent, of course, but typically in such use cases you have pretty tight control over the stack. Unsafe pretty much covers the gap between C/JNI and NIO.

[1]: http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/...

[edit: see 'public native byte getByte(long address)', for ex.]

link

jbooth 5319 days ago

Yeah, it's stupid. I think the underlying rationale was that the arrays are indexed with ints, and that decision was made in 1995 with universally 32-bit machines.

You can hack around it by having multiple memory mappings over a file starting at different offsets, but just use C honestly if you're doing something that's math-heavy and needs really big memory-mapped files, C's better for both of those anyways.

link