Hacker News new | ask | show | jobs
by nickez 887 days ago
Please read the article. His produces 200x intermediary values. Clickbait title, since it wasn't a leak.
3 comments

I agree that this is not a memory leak.

However, the semantic distinction between "this uses much more memory than expected" and "this is a memory leak" is a little subtle, and it seems pretty rude to call it clickbait.

No, memory leak is a very distinct definition: unused and stored, but inaccessible memory. Memory leak can be as small as a single word. In this case, it's just a memory. There is another term for this scenario, which I don't remember.

This is a case of optimization gone wrong, but nothing is leaked, and every single byte is accounted for.

The title is click bate, but article still interesting to read.

Clickbait (in the context of Rust). In languages with managed memory there are no true memory leaks so such wastes are called leaks. In lower-level languages, we should stay more strict with what we call things.
… Box::leak¹ is a function that exists. That seems like a memory leak, no?

Less tongue-in-cheek, if a program allocates far more memory than expected of it, I going to colloquially called that a "memory leak". If I see a Java program whose RSS is doing nothing but "up and to the right" until the VM runs out of memory and dies a sweet sweet page thrashing death, I'm going to describe that as a "memory leak". Having someone tell me, "well, actually, it's not a leak per se it's just that the JVM's GC didn't collect all the available garbage prior to running out of memory because …" … I don't care? You're just forcing me to wordsmith the problem description —-the problem is still there. Program is still dead, and still exceeding the constraints of the environment it should have been operating in.

The author had some assumptions: that Vec doesn't overalloc by more than 2x, and that collect allocates — one of those did turn out to be false, but I think if I polled Rust programmers, a fair number of them would make the wrong assumption. I would, and TIL from this article that it was wrong, and that collect can reuse the original allocation, despite it not being readily apparent how it knows how to do that with a generic Iterator. (And, the article got me to understand that part, too!)

Unlike most clickbaits which lure you in only to let you down, I learned something here. Reading it was worthwhile.

¹https://doc.rust-lang.org/stable/std/boxed/struct.Box.html#m...

> "If I see a Java program whose RSS is doing nothing but "up and to the right" until the VM runs out of memory and dies a sweet sweet page thrashing death, I'm going to describe that as a "memory leak"."

By this definition, if a program reads in a file and you point it to a small file then the program does not have a memory leak, but if you point it to a large enough file, then the program does have a memory leak. Whether or not a program has a memory leak doesn't depend on the code of the program, but how you use it. But then on a bigger computer, the program doesn't have a memory leak anymore.

That seems a less useful definition than the parent poster's / the common definition.

… that's really not the idea I'm trying to convey with the comment.

Clearly, if you feed a program a larger file that it is going to read into memory to process, it is then expected that it will consume more resources on account of it doing more work. But that is memory being expended on visible, useful work. All of the examples in the comment are referring to memory being "allocated" (in the sense of being assigned to the program) but not fulfilling any visibly useful function insofar as the operator/programmer can see: Java's GC being unable to effectively reclaim unused memory prior to killing a machine, the OP's example of a Vec allocating without (seemingly) have a purpose (…as it is excess of what is required to allow for amortized appends).

There is an implied steady state in what the program is doing. If it goes right from "loading" to "exit" then you need a more complicated analysis.

When you have that steady state, that definition looking at uncontrolled growth is more useful than trying to dissect whether the memory is truly unreachable or only practically unreachable.

>I don't care? You're just forcing me to wordsmith the problem description

Yes, because if you don't define the problem clearly, the problem won't be solved. Java being inefficient with memory use doesn't mean any memory was leaked.

Memory leaks can be tricky to track down, and if I spent 6 hours looking for a memory leak only to come back and found out you meant it uses more memory than what's efficient I'd be pissed I wasted 6 hours because you wanted to save 5 minutes.

"uses more memory than what's efficient"

There is a hidden memory store using orders of magnitude more RAM than the live data. Why do we need to nitpick exactly how hidden it is? Are you going to be mad if I don't know whether it's literally inaccessible or not?

Because there are legitimate reasons why memory can be allocated. This is like calling your OS cache a memory leak when you open up Task Manager and see you only have 400MB free. A memory leak implies memory that is lost for good and it no longer being kept track of.

Consider it this way - if I had a program that connected to a database and used a connection pool to improve performance, would it be a "connection leak" that 5 connections were opened even though the database was idle?

The framing here is similar - Rust, in an attempt to improve performance reused large memory allocations. Some applications do this on purpose and call it buffer pools.

I disagree that it’s a small semantic difference.

I don’t think it’s clickbait though, I think the author was just misusing terminology.

I said it was subtle, not small. I agree it's a valuable distinction.
A memory leak means it leaks, it's not anymore under control. Here the memory is under control, it can be reclaimed by the program.
I did read the article.

> the memory waste from excess capacity should always be at most 2x, but I was seeing over 200x.

So the 200x analysis is his problem?

200x is correct. What's happening is that he makes a vector with tons of capacity and only a few elements, so lots of wasted space. Then he turns that vector into another vector using an operation that used to allocate a new vector (thus releasing the wasted space) but now reuses the previous vector's allocation (retaining the wasted space).

It's definitely a sneaky bug. Not a "memory leak" in the normal sense since the memory will still be freed eventually. I'd call it an unexpected waste of memory.

Rust can re-use an allocation, but if the new item is smaller than the previous it doesn't automatically remove (free) the "wasted" memory left over from the previous allocation. I think this is categorically not a memory leak as the memory was absolutely accounted for and able to be freed (as evidenced by the `shrink_to_fit()`), but I can see how the author was initially confused by this optimization.

The 2x versus 200x confusion IMO is the OP was conflating that Vec will double in size when it needs more space, so they were assuming the memory should have only ever been 2x in the worst case of the new size. Which in the OPs case because the new type size was smaller than the previous, it seemed like a massive over-allocation.

Imagine you had a `Vec<Vec<u16>>` and to keep it simple it there were only 2 elements in both the inner and outer Vec's, which if we assume Rust doubled each Vec's allocation that'd be 4x4 "slots" of 2 bytes per slot (or 32 bytes total allocated...in reality it'd be a little different but to keep it simple let's just assume).

Now imagine you replace that allocation with a `Vec<Vec<u8>>` which even with the same doubling of the allocation size would be a maximum of 4x4 slots of 1 byte per slot (16 bytes total allocation required). Well we already have a 32 byte allocation and we only need 16, so Rust just re-uses it, and now it looks like we have 16 bytes of "waste."

Now the author was expecting at most 16 bytes (remember, 2x the new size) but was seeing 32 bytes because Rust just re-used the allocation and didn't free the "extra" 16 bytes. Further, when they ran `Vec::shrink_to_fit()` it shrunk down to only used space, which in our example would be a total of 4 bytes (2x2 of 1 byte slots actually used).

Meaning the author was comparing an observed 32 byte allocation, to an expectation of at most 16 bytes, and a properly sized allocation of 4 bytes. Factored out to their real world data I can see how they'd see numbers greater than "at most 2x."

Clickbait implies a specific intent to mislead for clicks, whereas I think there’s a completely good-faith disagreement here about the meaning of the word “memory leak.”