Hacker News new | ask | show | jobs
by ithkuil 1764 days ago
I hate to be "that guy" too, but coming from somebody who really likes Rust and is using it more and more (also at $dayjob now) we must admit that Go tooling is one step ahead. CPU profiler, allocation and heap profiler, lock contention profiler. It all comes out of the box.

Yes you have cargo flamegraph for profiling locally and you now have pprof-rs to mimick Go's embedded pprof support. But allocation heap profiling is still something I struggle with.

I saw there was a pprof-rs PR with a heap profiler but there was some doubt as to whether it worked correctly; to get a feeling of how that approach would work but without having to fork pprof-rs I implemented the https://github.com/mkmik/heappy crate which I can use to produce memory allocation flamegraphs (using the same "go tool pprof" tooling!) in real code I run and figure out if it works in practice before pushing it upstream.

But stuff you give for granted like figuring out which structure accounts for most used memory, is very hard to achieve. The servo project uses an internal macro that help you trace the object sizes but it's hard to use outside the servo project.

The GC makes some things very easy, and it's not just about programmers not having to care about memory; it's also that the same reference tracing mechanism used to implement GC can be used to cheaply get profiling information.

3 comments

> But allocation heap profiling is still something I struggle with.

Have you tried this one? https://github.com/koute/memory-profiler

How does this one compare to Heaptrack (which is a CLI/GUI memory profiler that supports C++ and probably Rust as well)?
There are many differences, but the main ones are that it has less overhead when profiling, more thorough analysis features (Heaptrack's GUI is relatively simple compared to it), and the next version will have scripting capabilities for analysis.
Hmm, took a look (it's called Bytehound now), it has no PKGBUILD nor a `cargo install` crate so I can't install it in systemwide or user PATH, and requires Yarn to download and build JS dependencies (likely hundreds or thousands).

I tried `cargo install --git https://github.com/koute/bytehound.git`, but that results in "error: multiple packages with binaries found: bytehound-cli, bytehound-gather, interrupt, linking, lz4-compress, simulation".

For the time being I'll stick with heaptrack.

> But allocation heap profiling is still something I struggle with.

Switch to a dumb allocator and then profile mmap calls or page faults? That should get you large allocations at least. It's a pretty crude proxy. The other allocation profilers I'm aware of cause significant slowdowns.

Sorry I don't understand what you're suggesting.

I currently intercept calls to malloc/calloc/realloc/... and capture stack traces. This way I know how much memory gets allocated for each allocation site. Since allocations usually go through constructor calls, the presence of a constructor in the stack trace can let you infer how many structures of a given type are been allocated. Knowing how big they are is more tricky since allocation for the whole struct and its parts doesn't have to happen entirely in the constructor (some structures like vectors and hashmaps can grow, some structures can collect data from other sources and then hold onto them, etc)

Furthermore to know how the live memory is broken down between object type and allocation sites, you also need to track freed memory. This is significantly more tricky to do efficiently. I currently take an allocation sample every N bytes being allocated and use a poisson process estimator to scale the total allocated bytes.

The only ways I know to account for in use memory is to track every single allocation or to add some extra space for every allocation where we record whether a block had been sampled and of yes, what was its corresponding allocation event.

Can you please elaborate more on your suggestion?

Looks like I misunderstood, I was suggesting something more indirect than tracing calls to malloc. If you're already doing that then I'm not aware of any better low-overhead solutions. There are high-overhead options such as valgrind's dhat.
What about valgrind? https://valgrind.org/
Valgrind is great, precise but slow.

The missing feature I was comparing was the ability of Go to provide good estimated heap and allocation profiles using minimal overhead on a production workload.