What I think is interesting about this is that they weren't able to easily measure or find using existing tools these hotspots -- they needed a combination of visualization and data munging to do so.
Visualization is an often overlooked tool in CS -- for example IDEs do little to zero visualization... only LightTable is starting to break out of the traditional text document. It also shows that depending on the problem visualization & data can be morphed and stretched to provide new insights when others might have walked away.
So why isn't this something that's a part of job interviewing or a bigger part of our normal toolbox as engineers?
It's often overlooked, because generating meaningful data, that can provide visual insight, is usually very difficult. Right now I'm working on a blog post that goes over how you can use motion bubble charts to track code changes and I use GitLab as an example. You can find a draft of the blog at:
Note the blog post is still in DRAFT state, so there are broken links and grammatical errors and what not.
Capturing meaningful data at the Enterprise scale, requires a lot of effort. There is a reason why I ended up creating my own real-time process monitoring system:
Really great stuff. The spot where he gets to a pretty good description of how he uses his flame graph is roughly here: https://youtu.be/O1YP8QP9gLA?t=611
With respect to that blog-post the bit about the truncated towers is a bit of a red herring if you're 100% new to flame graphs.
Generally you're right that the wide sections are where you want to focus your attention when looking for optimizations. The point I was trying to make in the blog post is that we had to take the flame graph visualization a step further to eliminate the noise obscuring a major hot spot. The large number of broken stacks was one of the first hurdles we had to cross to improve the clarity of the visualization.
BTW, this is a different flame graph and optimization than the one discussed in the YouTube video. We use flame graphs extensively throughout Netflix.
Can you write up how you fixed the broken call stacks? I've used Brendan's tools (with java-perf-map, also an awesome tool) to generate flame graphs for Scala code and had no idea I could only see 127 frames.
I'm currently using flame graphs at work. If your application hasn't been profiled recently, you'll usually get lots of improvement for very little effort.
Some 15 minutes of work improved CPU usage of my team's biggest fleet by ~40%. Considering we scaled up to 1500 c3.4xlarge hosts at peak in NA alone on that fleet, those 15 minutes kinda made my month :)
One thing to note once you eliminate the easy pickings is that as you go higher up the call graph, the profiler visualization is often misleading. There may be sections of code without safe-points, and stuff that appears wide on the flame graph may just be getting blamed for adjacent code that doesn't have safe points.
Profiling in general is a really good thing when you're seeing odd load/timing/performance issues... I once found a project was storing its' configuration settings (loaded/cached from DB) in a really badly performing way, an in-memory datatable, with text queries instead of a hashtable (not my design).
A single call wasn't so bad, but the lookup was happening many hundreds of times per request adding seconds to some requests. Wild how much difference a relatively small thing can make.
That brings up another distinction - profilers don't distinguish between a method that takes very little time to run but is called very often and another method that is pretty expensive, but is not called very often.
Ultimately, we do care about the total time taken, but the approaches necessary for the two cases above are very different. In many cases, the method that is simply called very often will call for some type of caching solution in the caller, while the more expensive method will require retooling within the method itself.
Well it depends. Stack is just a data structure. I'd say that the fact alone that you go from a deep call stack to "iterative" version where you can clearly see the stack in the code doesn't automatically make it much better.
Indeed this is normal. Apache Camel produced such huge stack traces they refactored the routing system specifically to reduce AsyncCallback usage and shorten stack traces; at one time Camel would dump traces thousands of lines long. However, pointing this out doesn't actually address the question; is there a performance issue indicated by these huge call stacks?
I've wondered about the question myself when encountering incredibly long stack traces while troubleshooting Java systems. I've also wondered if there is some more general dysfunction indicated. I've see impressive stack traces in C and C++, but nothing quite like what I've found in Java. What is the experience of C# programmers?
Visualization is an often overlooked tool in CS -- for example IDEs do little to zero visualization... only LightTable is starting to break out of the traditional text document. It also shows that depending on the problem visualization & data can be morphed and stretched to provide new insights when others might have walked away.
So why isn't this something that's a part of job interviewing or a bigger part of our normal toolbox as engineers?