Hacker News new | ask | show | jobs
by dxhdr 1092 days ago
How do you measure cache line utilization with a flame graph? I'm not sure it's as simple as you're describing. Poor memory access patterns from heap allocations can kill performance via death from a thousand cuts. It's not about spending "X% of the time on allocation," it's about every computation spending unnecessary time reading and writing to the cache due to poor choice of allocation strategy. Custom allocators can ensure data layout fits the access pattern.

More specifically, arrays are about the fastest thing around, and handing out objects from within a preallocated array tends to give the best access performance by a large margin. From what I understand you basically have to sidestep the Rust borrow checker to achieve this. Which does raise some interesting questions.

I think the original poster was also saying that heap allocations are slow, which is true, but I agree it'd be easier to tell if your program is having trouble with that.

2 comments

I'm unclear how cache line utilization is relevant here. In both Zig and Rust, the choice of allocator has no effect on the layout of data in a given collection. In other words, this Zig code:

    var foo_list = ArrayList(u8).init(foo_allocator);
    try foo_list.append(1);
    try foo_list.append(2);
    try foo_list.append(3);
...and this Rust code:

    let mut foo_list = Vec::new();
    foo_list.push(1);
    foo_list.push(2);
    foo_list.push(3);
...are both going to produce an array of [1,2,3] stored in the heap. The choice of allocator only affects where that array itself ends up being stored. Fetching an element of foo_list is going to cache the other elements in the list regardless of its location in memory, so the choice of allocator doesn't matter for that purpose. The only thing that could make a difference is if you want multiple different collections to be fetched in the same cache line, but if your collections are so small that multiple collections fit in the same cache line then your data isn't big enough to be worrying about optimizing your cache utilization (and furthermore, even if you carefully lay out your collections in such a way, as soon as any of your growable arrays need to be reallocated that completely throws all of your careful organization out the window).

> More specifically, arrays are about the fastest thing around, and handing out objects from within a preallocated array tends to give the best access performance by a large margin. From what I understand you basically have to sidestep the Rust borrow checker to achieve this.

Where did you get this impression? Rust has stack allocated arrays, and you can hand out references to them just as well as you can to any other owned type. I can think of no distinction that the borrow checker makes between stack- and heap-allocated types.

While I agree with you, my point was specifically the cost of a malloc. As someone who had actually profiled a pool vs malloc, this is kind of a salient point for me.

In some workloads short-lived allocations are common. Like memory is allocated for milliseconds and then deallocated - and X% of time will be spent just inside the allocator trying to find the best block to allocate. At this point you want to make sure that you don't spend more time in the allocator than allocations actually live.