| There are 2+1 major types of ByteBuffers:
Heap- backed by byte[] (or char[], int[], etc)
Direct: backed by C memory allocated via mmap (on linux). mmap can map to the RAM or a file. Memory mapped files are not an interesting case for NIO impl that works with sockets (On a flip note: FileChannel.transferTo(SocketChannel) doesn't involve memory mapping when the kernel supports it. Windows never supports it, though) Most impl. use heap ByteBuffer, then parsing requires state machines and often they are simplified by copying the buffers. The blocking IO doesn't really need a state machine as the stack serves that purpose. Then there is some reactive alike pattern (submitting tasks to an ExecutorService) that costs some more latency. Certainly, it's easier to work with and reason about, yet the more hand-outs there are the worse the performance/latency is.
There are minor issues like the choice of a good queue. It is an important one as java lacks MultiProducer/SingleConsumer queues out of the box, or even single producer/single consumer. Java does have MP/MC queues (CLQ is an outstanding one) but one has to pay some extra price (incl. false sharing sometimes) to use them. Ultimately the blocking IO cannot be "faster" than NIO per se since under the hood it uses poll(2)[0] with one socket. Before that it copies the java byte[] to a new location - for smaller byte[] it's the stack. Technically one can blow up the JVM if the stack is very tiny while entering socket.getOutputStream().write(byte[]) Lastly Selector.wakeup() has a stupid issue that involves entering a synchronized block each time even if there is an outstanding wake-up request already. Wakeup requests are implemented via pipes on linux (and a socket pair on Windows) that requires kernel mode switch. During the wakeup all the threads attempting to carry the task block on that very selector for no real reason. It can be played around with a CAS, so only one thread actually enters the monitor. I will repeat myself blocking IO doesn't have predictable latency and cannot be enforced. In the end it's all about the latency as bandwidth can be bought, more machines deployed but you can't buy latency. [0]http://linux.die.net/man/2/poll Some internal stuff on heap vs direct buffer: http://stackoverflow.com/a/11004231 |