Hacker News new | ask | show | jobs
by WorldMaker 272 days ago
> From what I can tell, DATAS basically makes a .Net application have a normal memory footprint.

In Server environments. DATAS is an upgrade to garbage collection in "Server mode". Server GC assumed it could be the only thing running on a machine and could use as much memory as it wanted and so would just easily over-allocate memory much more than what it immediately needed. (As the article points out, it would start at a large fixed amount of memory times the number of CPU cores.)

(As opposed to "Workstation GC" which has always tried to minimize memory consumption because it assumes it is running as only one of many apps on an end user system.)

> (and often run out of memory when run on Mono with the Bohem garbage collector.)

Not exactly a fair comparison between .NET's actual GC and Mono's old simpler GC before the merger. (Today's .NET shares the same GC on Windows and Linux [and macOS].)

> This is one of my big frustrations with .net, (although I tend to look at how dependency injection is implemented as a bigger culprit.)

Startup times have gotten a lot better in recent versions of .NET, AOT compiling has much improved (especially compared to the ancient ngen for anyone old enough to remember needing to use that for startup optimization), and while I agree .NET has seen a lot of terrible DI implementations the out-of-the-box one in Microsoft.Extensions does a lot of things right now, including avoiding a lot of Reflection in standard usage which was the big thing slowing down older DI systems. (I've seen people add Reflection based "helpers" back on top of the Microsoft.Extensions DI, but at that point that is a user problem more than a DI problem.)

> It does make me wonder: How practical is it to just use traditional reference counting and then periodically do a mark-and-sweep?

Technically the "mark" of "mark-and-sweep" can be implemented as traditional reference counting (and some of the earliest "mark-and-sweep" implementations did just that). It still only solves half the problem, though. Also, the optimizations made by modern "mark" systems come from that you don't need detailed counts, you just need tools equivalent to Bloom filters (what's the probability this is referenced at least once) and those can be much faster/more efficient to compute and use a lot less memory space than reference counters while doing that.

If your concern is total memory consumption, traditional reference counting uses more space (if only just to store counts), and by itself doesn't solve fragmentation (the "sweep" part of "mark-and-sweep"). From a practical standpoint, combining "traditional reference counting" and a "mark-and-sweep" sounds to me like asking for a less efficient "mark-twice-and-sweep" algorithm.

1 comments

See https://news.ycombinator.com/item?id=45360318 (if you didn't read it already)

The important point:

> IE, the critical difference is that reference counting frees memory immediately; albeit at a higher CPU cost and needing to still perform a mark-and-sweep to clear out cyclic references.

Regarding:

> If your concern is total memory consumption, traditional reference counting uses more space (if only just to store counts)

But it also frees memory immediately, meaning that many processes will appear to use less memory (unless fragmentation is an issue.)

Don't forget that GC often adds memory overhead too: IE, mark and sweep sets a generation counter in each object that it can reach, and then objects that weren't updated are reclaimed.

> But it also frees memory immediately, meaning that many processes will appear to use less memory (unless fragmentation is an issue.)

I think where we disagree is I that I of course do assume fragmentation is an issue, and also maybe what "immediately" means in this case. The type of total memory consumption that matters when you look in say Task Manager is when entire pages of memory are returned to the OS, not when individual objects are marked unused/free. In practical concerns, fragmentation will always delay entire pages returning to the OS. Reference counted languages build a lot of tricks to avoid fragmentation sure, but then if you are also trying to use a "mark-and-sweep" heap you lost most of those optimizations in part because you are then already assuming fragmentation is a problem to solve.

> Don't forget that GC often adds memory overhead too: IE, mark and sweep sets a generation counter in each object that it can reach

I did mention it, but also that GCs have advanced from "include a generation counter in each object" to things like generation bitmaps where that data is stored outside of the objects themselves and then from there further optimized into even more "compressed" forms ala Bloom filters (they maybe don't track every object, but every cluster of objects, or just objects crossing generation boundaries, or they use hash buckets and probability analysis, and many of these structures don't need to be permanent but are transient only during specific types of garbage collections; there has been a lot of work in the space and many decades of efficiencies studied and built). It's still overhead, but it is now a very different class of overhead from reference counts.

> It's still overhead, but it is now a very different class of overhead from reference counts.

Yes, I'm very aware of that.

Remember, my question about reference counting's practicality in C# is more of a rhetorical question to encourage discussion.