Hacker News new | ask | show | jobs
by mjpa86 1320 days ago
what happened to a memory leak being some memory that was allocated but had no reference to it so couldn't be freed? If you can copy the map and release it and the memory usage drops, there is no leak?
4 comments

That's the most used example of a memory leak but it is not the definition of a memory leak.

If you put data into a hash map and forget the key, you leaked.

This is also why valgrind classifies the leaks it reports with stuff like "still reachable" or "possibly still in use" (I might be remembering the exact phrasing incorrectly). It would be pretty hard to programmatically determine whether the memory that's still kept around was intended to be kept around or not, which is why valgrind supports generating "suppressions" (and specifying them in subsequent runs to be ignored).
This is the use case for weak maps, which both Java and JavaScript have. In the latter case, the map is not iterable, so one cannot observe JavaScript GC (through WeakMap at least).
A memory leak has always meant "this program keeps allocating more memory as it runs, even though it's not being asked to store anything new". That is equivalent to saying that a program has a memory leak when it fails to free memory that is no longer needed, not just memory that is no longer reachable.

For example, a common example of a memory leak is adding items to a "cache" without any mechanism that evicts items from the cache in any scenario. The "cache" is thus not a cache, but a memory leak (a common implementation of this leaking scenario is that items are put in a map, but never removed from the map).

Memory leak has never, as far as I know, referred to the specific case of memory that is no longer accessible from program code to be freed. In fact, this definition doesn't even make sense from a runtime system perspective, since even in C, the memory is actually always still reachable - from malloc()'s internal data structures.

Those pretty much can't happen in garbage collected languages, so the usage of the term has been widened to include things like this. I agree it's a shame.

Roedy Green coined the name "packratting" for this modern kind of memory leak: https://www.mindprod.com/jgloss/packratting.html

Memory leaks have always meant "failing to free memory that is no longer needed".

Garbage collection literature often stresses the difference between "no longer needed" and "not reachable", noting that the former is not automatically enforceable (it amounts to solving the halting problem), but the latter is only a heuristic. So, the fact that garbage collectors can't prevent all memory leaks is always stressed by the literature.

> Memory leaks have always meant "failing to free memory that is no longer needed".

Citation needed - that sounds reasonable, but i have never come across that formulation before.

I read about this in the Garbage Collection Handbook [1], which is an excellent overview of the entire field (at least up to ~2016), which discusses the distinction at large. I don't have it on me to quote, but a very clear distinction is made between "live objects" and "reachable objects", with reachabillity acting as a computable proxy for the uncomputable property of liveness. Liveness is defined as "this object will be used again by the program in some way", and a memory leak is defined as "failing to free an object that is no longer live". An unreachable object can't be live, but there are many ways of having a reachable object that is not live.

To prove that this is used in the literature at large, here is the abstract of a random GC paper I found [0]:

> Functional languages manage heap data through garbage collection. Since static analysis of heap data is difficult, garbage collectors conservatively approximate the liveness of heap objects by reachability i.e. every object that is reachable from the root set is considered live. Consequently, a large amount of memory that is reachable but not used further during execution is left uncollected by the collector.

[0] https://dl.acm.org/doi/10.1145/3381898.3397208

[1] https://www.google.com/books/edition/_/TKOfDQAAQBAJ?hl=en&gb... (you may try to search for live/reachable/leak to get some idea here as well)

I've also seen the term "memory bloat".
Yeah, that makes the title pretty much clickbait, because a memory leak in a memory-safe language would really be a big deal...
> a memory leak in a memory-safe language would really be a big deal...

It is not.

Let me show you a memory leak in the memory safe language, rust:

    let vec: Vec<u8> = Vec::with_capacity(1024);
    std::mem::forget(vec);
Let me show you a memory leak in the memory safe language, go:

    _ = time.Tick(1 * time.Second)
See the docs for time.Tick in the stdlib, which documents that calling it is a memory leak: https://pkg.go.dev/time@go1.19.3#Tick

You can also, if you want to leak memory in go, set the environment variable GOGC=off, and there you go, instant memory leak.

Practically any language, memory safe or otherwise, will let you create a memory leak.

Plenty easy to leak memory in memory-safe languages. I'm assuming we're including GC-ed languages in that set.
Go is just memory safe until you have a race, or so I have heard.
Indeed, this person showed that you can read/write to arbitrary memory addresses inside of a Go program: https://blog.stalkr.net/2022/01/universal-go-exploit-using-d...

Although, it's pretty useless as an exploit, since it requires you to be able to run arbitrary Go code to begin with (the author admits as much). It's _very_ unlikely that a remote attacker could exploit a data race in a regular Go program.

Every GC language by definition are memory safe, memory safety in programming does not mean than accessing the same resources from two thread should be safe.
I don't know how it works in other languages, but accessing a partially overwritten slice in Go (as will happen in the presence of data races) can cause your code to access out-of-bounds memory. And as we all know, once you have read/write access to arbitrary areas in memory, you've basically opened up Pandora's box.
Go is memory safe, if not then Java/C#/Python/Ruby are not either.
I don't think you can have data races (but certainly you can have race conditions) in python because of the GIL. I imagine Ruby is similar. Otherwise, no, the other languages you listed are not "memory safe". Once you start reading and writing to arbitrary locations in a process, almost anything can happen. But certainly you can say that there are different degrees of memory safety. All of the languages you mentioned are leaps and bounds above C/C++.
The same goes for Rust and most other "safe" languages. They all have synchronization primitives that make it safe, but you need to use them - the compiler won't always tell you.
For Rust specifically, the compiler does force safe programs to have no data races. That's actually what the ownership system, Send and Sync are about. If you manage to corrupt memory or have undefined behavior in safe Rust, that should be a compiler or library bug.

See https://doc.rust-lang.org/nomicon/races.html

That is basically the entire shtick of rust. That data is "owned", and only the owner can write. You can "borrow" something for read access, but if something is borrowed it can't be written to.

There are of course workarounds for this like reference counted wrappers and so on.

I have no idea what you mean here. Data races are next to impossible in safe Rust.