Hacker News new | ask | show | jobs
by ciucanu 3528 days ago
It looks like a faster version of HDFS since it's written in C++ (vs Java).

Another important aspect is that is using SSD + SATA(I suppose) , which could be a better option than standard SATA/SSD or LV cache using SATA + SSD.

Even if it's just a new thing, if it proves to be faster it may be implemented in Hadoop ecosystem in the future. HDFS has a lot of features being a mature piece of software but it lacks on the response time.

1 comments

"It looks like a faster version of HDFS since it's written in C++ (vs Java)."

This is non sequitur. The conclusion does not follow from the premise.

During non-GC periods, probably true. But having a realtime filesystem service that is prone to stop-the-world GC pauses is a showstopper for many applications.

Also, a C++ implementation is likelier to use far less memory than a Java implementation, assuming the skills of both programmers are roughly equal.

The underlying local filesystem on each node is not truly realtime, so a "realtime distributed file system" is already quite a stretch. Also JVM is perfectly fine with pause times below a few tens of ms worst-case (when using properly tuned G1, CMS GC), which is lower than worst-case latency induced by network + I/O.

As for using less memory - you don't allocate buffers for file data on the JVM heap. You allocate them in native memory exactly as you'd do it in C++. Therefore it is possible to create a JVM-based file system that handles petabytes of data with just as little as 100 MB heap, used mostly for small temporary objects.

Also, the code here is using mutexes a lot to synchronize threads and lock out whole objects. Therefore I think these "realtime" claims are quite exaggerated.

You're using the academic version of realtime, not the one that anybody cares about. HDFS's biggest problem is, and has always been, that it's literally impossible to tune it to give anything like reliable performance, mostly because the nameserver is a single point of lag for the entire system. "Worst case network and IO" latency is a huge stretch. Network performance is predictably sub-ms if you're using a network designed for modern distributed computing (A real stretch, I know, since almost all HDFS installations are on old-school core-router-tree infrastructure.) The IO operations are incredibly unpredictable - For a client at a time. Having individual servers that 10-20ms worst-case performance hiccoughs is nowhere near as bad for a system as all of your clients hiccoughing for even 5ms at the same time.
HDFS biggest problem is its SPOF master-slave architecture, not JVM nor GC. With a truly distributed shared nothing system Java Gc would not be a problem, because servers can now run with no major Gc for hours or days. So two servers or clients doing Gc at the same time are very unlikely. And even if some of them do, the pauses from Gc are much more predictable than the pauses from I/O which on a loaded system can take seconds, not milliseconds.

Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.

> Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.

Can you name one?

> As for using less memory - you don't allocate buffers for file data on the JVM heap.

I meant the code size and heap allocations for data structures, not file buffers.

And 100MB is huge compared to many C++ programs. And that's on top of the Java runtime!

Sure and this DFS in C++ memory use is probably huge compared to many hand-crafted assembly or C programs from 1980s. But who cares? 100 MB or even 1GB is really tiny for today's server hardware. And Java runtime itself is a few MB really. What takes most memory in many Java programs (e.g. IDEs) is code and libraries.
Size can lead to a tremendous difference in performance on modern CPUs, particularly if you can take advantage of L2/L3 instruction and data caches. It still matters, even on modern "big memory" systems where gigabytes of installed RAM are the norm.