No offense, but have you spent much time with Go or Rust? In most cases Go's performance difference isn't extremely divergent from C++'s and it preserves safety and enhances convenience, exceptions and adds concurrency.
The graphviz / svg output from the profiler shown there is very nice. Does something similar exist for other languages? I particularly like the scaling of the node size by the appropriate value (CPU time, memory use, etc).