Hacker News new | ask | show | jobs
by crissaegrim 710 days ago
I had the pleasure to work with Dick on getting KUtrace to work on Android devices last year. It was a great experience to work with one of the greats in systems performance. He was a wealth of information regarding performance bottlenecks and optimizations.

KUtrace is absolutely one of the most powerful tools I've used for deeply understanding performance bottlenecks (after isolating issues) such as poor scheduling behavior. I would highly recommend reading his book "Understanding Software Dynamics" [1] if you are interested in learning more about KUtrace or performance bottlenecks/optimizations in general. The book is quite dense and dives deep into the performance characteristics of many examples of the five fundamental resources (according to Dick): CPU, Memory, Disk/SSD, Network, and Software critical sections.

[1]: https://www.oreilly.com/library/view/understanding-software-...

4 comments

> performance bottlenecks/optimizations in general

How applicable to the general cases is it? I’m deeply interested in the topic, but unlikely to actually be running KUTrace, fwiw.

So the book is split into four major sections: Measurement, Observation, KUtrace, Reasoning.

"Measurement" delves into understanding and measuring four fundamental resources: CPU, Memory, Disk/SSD, and Network. This section is quite dense and explores both the depth and breadth of understanding performance of programs. For example, there is a chapter on optimizing code to use caches more efficiently. Though I will say this section is obviously not a complete exploration of all aspects of performance as there are many many many more things which can affect the performance in such complex systems like modern computers. But what Dick does is in this section is to give you more tools in your toolbelt to understand performance better.

"Observation" looks at existing tooling (so profilers, tracing tools, etc.) and discusses where they are useful or where they fall short.

"KUtrace" introduces KUtrace, its kernel module, and its timeline visualization tool. It discusses its design and implementation and why is it so fast and low-overhead.

"Reasoning" has case studies that looks at particular kinds of performance pathologies such as "waiting for CPUs" etc. Dick uses KUtrace here to tease out the underlying inefficiencies in the analyzed programs.

So the first two sections are essentially orthogonal to if you want to use KUtrace or not, but the last two sections are about KUtrace and how to use it to understand performance bottlenecks. Even if you don't use KUtrace, the "Reasoning" section can still be insightful imo as KUtrace is just a tool at the end of the day, and the real insight is why or what is causing the performance issue.

Thanks. I appreciate the thoughtful response, especially as so many others are also clamouring for comments regarding their specific cases.
Thanks, looks interesting. Does it cover measuring memory bandwidth consumption? This is something I feel there is a lack of good tooling for.
What are you precisely trying to measure? Theoretically if you know the performance counters you want to measure, you can replace the IPC counter in the kernel module. I believe Dick has a different version of the kernel module which measure LLC misses instead of CPU cycles. Does that answer your question?
Hey, thanks for the response. Is it just a matter of measuring the LLC miss rate and then figuring out the max DRAM bandwidth somehow? What about in a multicore setting? NUMA? It would be nice to have a library or tool that works this out - always surprises me there isn't something off the shelf.
You might be interested to use Intel VTune then if you have an Intel CPU. I believe it has a profiling option that shows memory bandwidth over time [1].

[1]: https://www.intel.com/content/www/us/en/docs/vtune-profiler/...

Perfetto on Android is very very slick. Why did you need KUtrace? What was perfetto missing?

Nice handle btw. Grinding for that was unforgettable...

Perfetto is pretty cool (if not a very overloaded term since there's the perfetto UI, perfetto backend, etc.), but we were generally interested in fairly low-level aspects like waiting on memory, etc. which require low-overhead tracing. ftrace (which perfetto uses under the hood for all the system events) does have observable overhead. KUtrace has nifty visualization for the different wait kinds (waiting for CPU, locks, memory, etc.). There was also the novelty of trying to get KUtrace to work with Android.

The upside to perfetto of course is the much much richer tooling, infrastructure, and ease of use since it comes pre-installed on your phone.

> Nice handle btw. Grinding for that was unforgettable...

Haha thanks. Symphony of the Night is easily one of my favorite games -- I can pick it up any time and play it until 200.6% completion ;)

Interesting! Does it also work on non-rooted Android devices?
No it unfortunately does not. You require root to remount the read-only system partition, to insert the kernel modules, and to turn SELinux off. We used a "userdebug" build to get root, but I imagine most of this could also be done with a phone rooted through other means (I haven't tried it, however).