Hacker News new | ask | show | jobs
by GolDDranks 440 days ago
Just having features associated with OOP isn't a bad thing. Object upcasting has its uses.

It's some of the other stuff that gets OOP its bad rap.

Garbage collection, common in many OOP languages, enables having arbitrary object graphs. The programmer doesn't need to think hard about who has a reference to who, and how long these references live. This leads to performance-defeating pointer chasing, and fuzzy, not-well-thought-of object lifetimes, which in turn lead to accidental memory leaks.

Additionally, references in major languages are unconditionally mutable. Having long-lived mutable references from arbitrary places makes it hard to track object states. It makes easier to end up in unexpected states, and accidentally violate invariants from subroutine code, which means bugs.

Also, class-like functionality conflates data-layout, privacy/encapsulation, API, namespacing and method dispatch into a single thing. Untangling that knot and having control over these features separately might make a better design.

3 comments

I agree this is useful. I also think it isn’t the end of the world to support some semblance of OOP in Rust.

If it helps you ship the business logic, sometimes it’s okay to concede some performance or other cost.

"sometimes" is doing a lot of work there.

Sometimes it is, but also sometimes it isn't, and Rust at least gives you a choice (you can use Arc all over the place, or if performance is critical you can be more careful about specifying lifetimes)

Arc doesn’t give you the choice until this upcasting feature lands in stable.
You make some really good criticisms of OOP language design. I take issue with the part about garbage collecting, as I don't think your points apply well to tracing garbage collectors. In practice, the only way to have "memory leaks" is to keep strong references to objects that aren't being used, which is a form of logic bug that can happen in just about any language. Also good API design can largely alleviate handling of external resources with clear lifetimes (files, db connections, etc), and almost any decent OOP languages will have a concept of finalizers to ensure that the resources aren't leaked.
Finalizers are crap. Having some sort of do-X-with-resource is much better but now you're back to caring about resource ownership and so it's reasonable to ask what it was that garbage collection bought you.

I agree with your parent that these sort of accidental leaks are more likely though of course not uniquely associated with, having a GC so that you can forget who owns objects.

Suppose we're developing a "store" web site with an OO language - every Shopper has a reference to Previous Purchases which helps you guide them towards products that complement things they bought before or are logical replacements if they're consumables. Now of course those Previous Purchases should refer into a Catalog of products which were for sale at the time, you may not sell incandescent bulbs today but you did in 2003 when this shopper bought one. And of course the Catalog needs Photographs of each product as well as other details.

So now - without ever explicitly meaning this to happen - when a customer named Sarah just logged in, that brought 18GB of JPEGs into memory because Sarah bought a USB mouse from the store in spring 2008, and at the time your catalog included 18GB of photographs. No code displays these long forgotten photographs, so they won't actually be kept in cache or anything, but in practical terms you've got a huge leak.

I claim it is easier to make this mistake in a GC language because it's not "in your face" that you're carrying around all these object relationships that you didn't need. In a toy system (e.g. your unit tests) it will work as expected.

Using a website is a strange example because that’s not at all how web services are architected. The only pieces that would have images in memory is a file serving layer on the server side and the user's browser.
> I claim it is easier to make this mistake in a GC language because it's not "in your face" that you're carrying around all these object relationships that you didn't need. In a toy system (e.g. your unit tests) it will work as expected.

I simply don't see how these mistakes would be any less obvious than in an equally complex C codebase, for instance.

Agree on Finalizers being crap. Do x with resources is shit too though because I don't want indentation hell. Well I guess you could solve it by introducing a do-x-with variant that doesn't open a new block but rather attaches to the surrounding one.
> Finalizers are crap. Having some sort of do-X-with-resource is much better but now you're back to caring about resource ownership and so it's reasonable to ask what it was that garbage collection bought you.

What garbage collection here brings you, and what it has always brought you here, is to free you from having to think about objects' memory lifetimes. Which are different from other possible resource usages, like if a mutex is locked or whatnot.

In fact, I'd claim that conflating the two as is done for example with C++'s RAII or Rust Drop-trait is extremely crap, since now memory allocations and resource acquisition are explicitly linked, even though they don't need to be. This also explains why, as you say, finalizers are crap.

Things like Python's context managers (and with-blocks), C#'s IDisposable and using-blocks, and Java's AutoCloseables with try-with-resources handle this in a more principled manner.

> I agree with your parent that these sort of accidental leaks are more likely though of course not uniquely associated with, having a GC so that you can forget who owns objects. > > Suppose we're developing a "store" web site with an OO language - every Shopper has a reference to Previous Purchases which helps you guide them towards products that complement things they bought before or are logical replacements if they're consumables. Now of course those Previous Purchases should refer into a Catalog of products which were for sale at the time, you may not sell incandescent bulbs today but you did in 2003 when this shopper bought one. And of course the Catalog needs Photographs of each product as well as other details.

Why are things like the catalogues of previous purchases necessary to keep around? And why load them up-front instead of loading them lazily if you actually do need a catalogue of incandescent light bulbs from 2003 for whatever reason?

> So now - without ever explicitly meaning this to happen - when a customer named Sarah just logged in, that brought 18GB of JPEGs into memory because Sarah bought a USB mouse from the store in spring 2008, and at the time your catalog included 18GB of photographs. No code displays these long forgotten photographs, so they won't actually be kept in cache or anything, but in practical terms you've got a huge leak.

I'm sorry, what‽ Who is this incompetent engineer you're talking about? First of all, why are we loading the product history of Sarah when we're just showing whatever landing page we show whenever a customer logs in? Why are we loading the whole thing, instead of say, the last month's purchases? Oh, and the elephant in the room:

WHY THE HELL ARE WE LOADING 18 GB OF JPEGS INTO OUR OBJECT GRAPH WHEN SHOWING A GOD-DAMN LOGIN LANDING PAGE‽ Instead of, you know, AT MOST JUST LOADING THE LINKS TO THE JPEGS INSTEAD OF THE ACTUAL IMAGE CONTENTS!

Nothing about this has anything to do with whether a language implementation has GC or not, but whether the hypothetical engineer in question, that wrote this damn thing, knows how to do their job correctly or not.

> I claim it is easier to make this mistake in a GC language because it's not "in your face" that you're carrying around all these object relationships that you didn't need. In a toy system (e.g. your unit tests) it will work as expected.

I don't know, if the production memory profiler started saying that there's occasionally a spike of 18 GB taken up by a bunch of Photograph-objects, that would certainly raise some eyebrows. And especially since in this case they were made by an insane person, that thought that storing the JPEG data in the objects themselves is a sane design.

---

As you said above, this is very much not unique to language implementations with GC. Similar mistakes can be made for example in Rust. And you may say that no competent Rust developer would do this kind of a mistake. And that's almost certainly true, since the scenario is insane. But looking at the scenario, the person making the shopping site was clearly not competent, because they did, well, that.

Why do you treat memory allocations as a special resource that should have different reasoning about lifetime than something like a file resource, a db handle, etc etc? Sure if you don’t care about how much memory you’re using it solves a small niche of a problem for a lot of overhead (both CPU and memory footprint) but Rust does a really good job of making it easy (and you rarely / never really have to implement Drop).

The context managers and stuff are a crutch and admittance that the tracing GC model is flawed for non memory use cases.

Memory is freed as soon as the program closes by the operating system. If you have an open connection to a database they might expect you to talk to them before you close the handle (for example, to differentiate between crash and normal shutdown). Any resource shared between programs might have a protocol that needs to be followed which the operating system might not do for you.

The GC model only cares about memory, so I don't really understand what you mean by "flawed for non memory use cases". It was never designed for anything other than managing memory for you. You wouldn't expect a car to be able to fly, would you?

I personally like the distinction between memory and other resources. If you look hard enough I'm sure you'll find ways to break a model that conflates the two. Similar to this, the "everything is a file" model breaks down for some applications. Sure, a program designed to only work with files might be able to read from a socket/stream, but the behavior and assumptions you can make vary. For example, it is reasonable to assume that a file has a limited size. Imagine you're doing a "read_all" on some video-feed because /dev/camera was the handle you've been given. I'm sure that would blow up the program.

In short, until there is a model that can reasonably explain why memory and other resources can be homogenized into one model without issue, I believe it's best to accept the special treatment.

> to differentiate between crash and normal shutdown

> Any resource shared between programs might have a protocol that needs to be followed which the operating system might not do for you.

Run far and quickly from any software that attempts to do this. It’s a sure fire indication of fragile code. At best such things should be restricted to optimizing performance maybe if it doesn’t risk correctness. But it’s better to just not make any assumptions about reliable delivery indicating graceful shutdown or lack of signal indicating crash.

> In short, until there is a model that can reasonably explain why memory and other resources can be homogenized into one model without issue, I believe it's best to accept the special treatment.

C++ and Rust do provide compelling reasons why memory is no different from other resources. I think the counter is the better position - you have to prove that a memory resource is really different for the purposes of resource management than anything else. You can easily demonstrate why network and files probably should have different APIs (drastically different latencies, different properties to configure etc). That’s why network file systems generally have such poor performance - the as-if pretension sacrifices a lot of performance.

The main argument tracing GC basically makes is that it’s ok to be geeedy and wasteful with RAM because there’s so much of it that retaining it extra is a better tradeoff. Similarly it argues that the cycles taken by GC and random variable length pauses it often generates don’t matter most of the time. The counter is that while it probably doesn’t matter in the P90 case, it does matter when everyone takes this attitude and P95+ latencies (depending on how many services are between you and the user you’d be surprised how many 9s of good latency your services have to achieve for the eyeball user to observe an overall good P90 score).

> For example, it is reasonable to assume that a file has a limited size. Imagine you're doing a "read_all" on some video-feed because /dev/camera was the handle you've been given. I'm sure that would blow up the program.

Right, which is one of many reasons why you shouldn’t ever assume files you’re given are finite length. You could be passed stdin too. Of course you can make simplifications in cases, but that requires domain-specific knowledge, something tracing GCs do not have because they’re agnostic to use case.

Because 99.9% of the time the memory allocation I'm doing is unimportant. I just want some memory to somehow be allocated and cleaned up at some later time and it just does not matter how any of that happens.

Reasoning about lifetimes and such would therefore be severe overkill and would occupy my mental faculties for next to zero benefit.

Cleaning up other resources, like file handles, tends to be more important.

Then stick it in a Box, RC or Arc and forget about it. You only need lifetimes for cases where you want to avoid heap allocations.
> Garbage collection, common in many OOP languages

I would argue this is correlation, not causation. And of the many flaws one can raise with OOP, that GC is pretty low on that list.