Hacker News new | ask | show | jobs
by cestith 3025 days ago
It's about using something in one language for its features and only porting the critical sections to C++ via a clean API. This is the sort of advice we've been giving people for decades. Choose the language for what you want to build, measure and profile performance if necessary, find the bottleneck on the hot path, decouple that from the bulk of the code, and reach to a lower level for performance only in that clearly defined section.

They managed to generalize one application that meets their feature needs to be a front end to another existing application with fewer features but better performance as a back end. They're optimizing their hot path by decoupling it from the rest of the application and handing off to C++ code they didn't even have to write. Adding pluggable storage engines to Cassandra means that if they make the API smooth enough they can have engines in C, C++, Erlang, Go, Rust, ML, or whatever in the future without changing their front end. That's a big win even beyond this tail latency issue.

1 comments

Well, other than storage engine, the next big part of a database software is the query planner/optimizer which Cassandra doesn't have (due to simple KV nature of it). So there isn't much remaining. In a long term plan, rewrite them all and you have single code base and you'll benefit from mighty C++ in other components of the database. And there is still room for more optimizations: SIMD, ...

The GC problem is not limited to C*. This shit(virtual machine) is hitting the whole Hadoop stack: HDFS, Hive, Spark, Flink, Pig...

Immense number of tickets in any fairly large cluster is related somewhat to GC and JVM behavior.