Hacker News new | ask | show | jobs
by newswasboring 1908 days ago
I didn't even know julia GC had issues. Care to elaborate?
2 comments

It doesn’t, it just doesn’t have a $100B GC like Java does. Rather than spending that kind of money trying to compensate for a language design that generates massive amounts of garbage (ie Java), Julia takes the approach of making it easier to avoid generating garbage in the first place, eg by using immutable structures that can be stack allocated and having nice APIs for modifying pre-allocated data structures in place.
I don't think you can allocate immutable data structures on the stack.

I've never seen a 10 million entry immutable set on the stack but I could be wrong.

Obviously a 10-million-element array doesn't get stack allocated. But if individual objects of some type are immutable, then they can be stack allocated, or maybe not allocated at all (kept in registers).

Edit: reading your other post, it seems like you may mean persistent data structures, a la Clojure, rather than immutable structures, which are quite different. The former would indeed always be heap-allocated (it's necessary since they are quite pointer-heavy). Immutable structures, on the other hand are detached from any particular location in memory.

Moreover, if the elements in an array are mutable, eg Java objects, then each one needs to be individually heap allocated with a vtable pointer and the array has to be an array of pointers to those individually allocated objects. For pointer-sized objects (say an object that has a single pointer-sized field), that takes 3x memory to store x objects, so that's already brutal, but worse is that since the objects are all individually allocated, the GC needs to look at every single one, and freeing the space is a fragmentation nightmare. If the objects are immutable (and the type is final; btw all concrete types are final in Julia), then you can store them inline with no overhead and GC can deal with them a single big block.

Btw, I had to vouch for you to undead your posts in order to reply. Looks like you got downvoted a bunch.

Ouch.
I mean I’m not trying to hate on Java — pointer-heavy programming was all the rage when it was designed, and GC was a hot research topic, so there was good reason to be optimistic about that approach. But it turns out that it’s very hard to make up for generating tons of garbage and pointer-heavy programming hasn’t aged well given the way hardware has evolved (pointers are large and indirection is expensive).
You're mixing two things here: memory management and memory layout. This "pointer-heavy programming" is, indeed, a bad fit for modern hardware in terms of processing speed due to cache misses, which is why even Java is now getting user-defined primitive types (aka inline types, aka value types), but in terms of memory management, in recent versions OpenJDK is pretty spectacular, not only in throughput but also latency (ZGC in JDK 16 has sub millisecond maximum pause time for any size of heap and up to a very respectable allocation rate: https://malloc.se/blog/zgc-jdk16 and both throughput and max allocation rate are expected to grow drastically in the coming year with ZGC becoming generational). As far as performance is concerned, GC can now be considered a solved problem (albeit one that requires a complex implementation); the only real price you pay is in footprint overhead.
I'm not — memory layout and memory management are (fairly obviously, I would think) intimately related. In particular, pointer-heavy memory layouts put way more stress on the garbage collector. Java's choice of making objects mutable, subtypeable and have reference semantics, basically forces them to be individually heap-allocated and accessed via pointers. On the other hand, if you design your language so that you can avoid heap allocating lots of individual objects, then you can get away with a much simpler garbage collector. Java only needs spectacular GC technology because the language is designed in such a way that it generates a spectacular amount of garbage.
I would say no. To have stellar performance, you'll need compaction, you'll need parallelism (of GC threads), and you'll need concurrency between the GC threads and mutator threads; and for good throughput/footprint tradeoff you'll need generational collection. True, you might not need to contend with allocation rates that are that high, but getting, say, concurrent compaction (as in ZGC) and/or partial collections (as in G1), requires a sophisticated GC. E.g. Go isn't as pointer-heavy as (pre-Valhalla) Java, and its GC is simple and offers very good latency, but it doesn't compact and it throttles, leading to lower throughput (I mean total program sluggishness) than you'd see in Java, even with a much higher allocation rate. The thing is that even with a low allocation rate, you'd get some challenging heaps, only later, say, every 10 seconds instead of every 5.

It's true that a simpler GC might get you acceptable performance for your requirements if your allocation rate is relatively low, but you still won't get OpenJDK performance. So I'd say that if you design your language to require fewer objects, then you can get by with a simple GC if your performance requirements aren't too demanding.

All that dereferencing puts a higher load on data structure traversal (which is why Java is getting "flattenable" types) than on the GC. The main reason for Java's particular GC challenges isn't its pointer-heavy (pre-Valhalla) design but the mere fact that it is the GCed platform that sees the heaviest workloads and most challenging requirements by far. Java's GC needs to work hard mostly for the simple reason that Java is asked to do a lot (and the better some automated mechanism works, the more people push it).

Hating on Java seems perfectly reasonable to me.
so are you for GC or against GC?

In other posts you actually argue that GCs help you reduce complexity because manual memory management is too much of a hassle.

May be immutable is not the correct term - persistent data structures is what I like support for: that is my use-case.

I think you can have efficient persistent data structures without a GC, but that requires fast reference counting and in turn, that requires a lot of work to be competitive with the JVM.

I also understand that my use-case is not Julia's focus. That's perfectly fine.

That's a major oversimplification. GC is good for ease of use and safety of a high level language. GC is never as performant as not requiring heap allocations at all. Julia has a GC, but also provides a lot of tools to avoid needing the GC in high performance computations. This combination gives ease of use and performance.

Java sacrifices some performance for having this "one paradigm" of all objects, and then heavily invested in the GC, but in many cases like writing a BLAS it still just will not give performance exactly matching a highly tuned code, where as in Julia for example you can write really fast BLAS codes like Octavian.jl.

Julia is multi-paradigm in a way that is purposely designed for how these features compose. I think it's important to appreciate that design choice, in both its pros and cons.

To tie Octavian.jl into this memory allocation discussion:

Octavian uses stack-allocated temporaries when "packing" left matrix ("A" in "A*B"). These temporaries can have tens of thousands of elements, so that's a non-trivial stack allocation (the memory is mutable to boot). No heap allocations or GC activity needed (just a GC.@preserve to mark its lifetime). If I understand correctly, this isn't something that'd be possible in Java?

To be fair, you can also just use preallocated global memory for your temporaries, since the maximum amount of memory needed is known ahead of time.

I don't know that the object model is why writing a BLAS in Java doesn't make sense. After all they special case `float` and `double` as primitives, which bifurcates the whole type system and is its own whole issue, but means that you can store them efficiently inline. I'm actually not sure what stops someone from writing a BLAS in Java except that it would be hard and there's no point.
I like your response, and yes, it was a major oversimplification and I'm sorry for that.

Indeed, it is always about design choices and trade-offs. I can see why BLAS code is important and why Julia is an optimal choice for computation heavy problems.

I love GC — it solves a ton of nasty problems in a programming language design with a single feature that users mostly don't have to worry about. Just because you have a GC, however, doesn't mean that it's a good idea to generate as much garbage as you can — garbage collection isn't free. That's where Java IMO went wrong. Java's design — objects are and subtypeable (by default) and mutable with reference semantics — generates an absolute epic amount of garbage. It seems like the hope was that improvements in GC technology would make this a non-issue in the future, but we're in the future and it hasn't turned out that way: even with vast amounts of money that have been spent on JVM GCs, garbage is still often an issue in Java. And this has given GC in general a bad name IMO quite unfairly. It just happens that Java simultaneously popularized GC and gave it a bad name by having a design that made it virtually impossible for the GC to keep up with the amount of garbage that was generated.

It is entirely possible to design a garbage collected language that doesn't generate so goddamned much garbage — and this works much, much better because a relatively simple GC can easily keep up. Julia and Go are good examples of this. Julia uses immutable types extensively and by default, while Go uses value semantics, which has a similar effect on garbage (but has other issues). With a language design that doesn't spew so much garbage, if you only care about throughput, a relatively simple generational mark-and-sweep collector is totally fine. This is what Julia has. If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

Persistent data structures are a whole different question that I haven't really spent much time thinking about. Clojure seems to be the state of the art there but I have no idea if that's because of the JVM or despite it.

> If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

How possible would it be for Julia to add this? I keep thinking Julia would be great for graphical environments and gaming, but high GC latency won't work there.

Thanks for the reply!

Unfortunately, persistent data structures tend to produce (short-lived) garbage which the JVM is very good at collecting!

So yes, Clojure benefits immensely from the JVM.

It is also an interesting research topic whether (optimised) reference counting would be a better approach.

Regarding objects, there is also a "middle ground" to consider:

Split big (immutable) arrays in smaller ones, connect them with some pointers in between, and you are still cache friendly.

Also, you can do a lot on the application level to reduce garbage, and most Java programmers don't care for that exactly because of JVM.

> garbage is still often an issue in Java.

Not anymore. That future is here. Java is getting "flattenable" types not because of GC, but because of iteration.

The biggest struggle Julia's GC has is that in multi-threaded workloads, it sometimes isn't aggressive enough to reclaim memory leading to OOM.
This is very legit issue that the compiler team has their eye on and plans to work on.
fyi, I'm oscardssmith on most other channels.
Hi, Oscar! Nice user name :P
It's my backup username discovered when I was 10 or so, and I wanted something that wasn't my name and would be available everywhere.