Hacker News new | ask | show | jobs
by odipar 1912 days ago
I like Julia (mostly because of multiple dispatch). The only thing that's lacking is an industry strength Garbage Collector, something that can be found in the JVM.

I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.

3 comments

Julia's garbage collector is quite good.

> I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.

If you use immutable data-structures in julia, you're rather unlikely to end up with any heap allocations at all. Unlike Java, Julia is very capable of stack allocating user defined types.

I think that’s true for small structs made of floats but not true for something like an immutable lisp-style linked list.
Not just floats, and I'm not sure they have to be that small. All sorts of structs containing bitstypes/value types can be stack allocated. In fact, even some structs with pointers to heap-allocated memory can be stack-allocated (such as array views.)

I don't know about linked lists, though.

A low-latency GC would also be great. But again, the JVM only has that due to many millions of dollars spent over decades.
I didn't even know julia GC had issues. Care to elaborate?
It doesn’t, it just doesn’t have a $100B GC like Java does. Rather than spending that kind of money trying to compensate for a language design that generates massive amounts of garbage (ie Java), Julia takes the approach of making it easier to avoid generating garbage in the first place, eg by using immutable structures that can be stack allocated and having nice APIs for modifying pre-allocated data structures in place.
I don't think you can allocate immutable data structures on the stack.

I've never seen a 10 million entry immutable set on the stack but I could be wrong.

Obviously a 10-million-element array doesn't get stack allocated. But if individual objects of some type are immutable, then they can be stack allocated, or maybe not allocated at all (kept in registers).

Edit: reading your other post, it seems like you may mean persistent data structures, a la Clojure, rather than immutable structures, which are quite different. The former would indeed always be heap-allocated (it's necessary since they are quite pointer-heavy). Immutable structures, on the other hand are detached from any particular location in memory.

Moreover, if the elements in an array are mutable, eg Java objects, then each one needs to be individually heap allocated with a vtable pointer and the array has to be an array of pointers to those individually allocated objects. For pointer-sized objects (say an object that has a single pointer-sized field), that takes 3x memory to store x objects, so that's already brutal, but worse is that since the objects are all individually allocated, the GC needs to look at every single one, and freeing the space is a fragmentation nightmare. If the objects are immutable (and the type is final; btw all concrete types are final in Julia), then you can store them inline with no overhead and GC can deal with them a single big block.

Btw, I had to vouch for you to undead your posts in order to reply. Looks like you got downvoted a bunch.

Ouch.
I mean I’m not trying to hate on Java — pointer-heavy programming was all the rage when it was designed, and GC was a hot research topic, so there was good reason to be optimistic about that approach. But it turns out that it’s very hard to make up for generating tons of garbage and pointer-heavy programming hasn’t aged well given the way hardware has evolved (pointers are large and indirection is expensive).
You're mixing two things here: memory management and memory layout. This "pointer-heavy programming" is, indeed, a bad fit for modern hardware in terms of processing speed due to cache misses, which is why even Java is now getting user-defined primitive types (aka inline types, aka value types), but in terms of memory management, in recent versions OpenJDK is pretty spectacular, not only in throughput but also latency (ZGC in JDK 16 has sub millisecond maximum pause time for any size of heap and up to a very respectable allocation rate: https://malloc.se/blog/zgc-jdk16 and both throughput and max allocation rate are expected to grow drastically in the coming year with ZGC becoming generational). As far as performance is concerned, GC can now be considered a solved problem (albeit one that requires a complex implementation); the only real price you pay is in footprint overhead.
I'm not — memory layout and memory management are (fairly obviously, I would think) intimately related. In particular, pointer-heavy memory layouts put way more stress on the garbage collector. Java's choice of making objects mutable, subtypeable and have reference semantics, basically forces them to be individually heap-allocated and accessed via pointers. On the other hand, if you design your language so that you can avoid heap allocating lots of individual objects, then you can get away with a much simpler garbage collector. Java only needs spectacular GC technology because the language is designed in such a way that it generates a spectacular amount of garbage.
Hating on Java seems perfectly reasonable to me.
so are you for GC or against GC?

In other posts you actually argue that GCs help you reduce complexity because manual memory management is too much of a hassle.

May be immutable is not the correct term - persistent data structures is what I like support for: that is my use-case.

I think you can have efficient persistent data structures without a GC, but that requires fast reference counting and in turn, that requires a lot of work to be competitive with the JVM.

I also understand that my use-case is not Julia's focus. That's perfectly fine.

That's a major oversimplification. GC is good for ease of use and safety of a high level language. GC is never as performant as not requiring heap allocations at all. Julia has a GC, but also provides a lot of tools to avoid needing the GC in high performance computations. This combination gives ease of use and performance.

Java sacrifices some performance for having this "one paradigm" of all objects, and then heavily invested in the GC, but in many cases like writing a BLAS it still just will not give performance exactly matching a highly tuned code, where as in Julia for example you can write really fast BLAS codes like Octavian.jl.

Julia is multi-paradigm in a way that is purposely designed for how these features compose. I think it's important to appreciate that design choice, in both its pros and cons.

I love GC — it solves a ton of nasty problems in a programming language design with a single feature that users mostly don't have to worry about. Just because you have a GC, however, doesn't mean that it's a good idea to generate as much garbage as you can — garbage collection isn't free. That's where Java IMO went wrong. Java's design — objects are and subtypeable (by default) and mutable with reference semantics — generates an absolute epic amount of garbage. It seems like the hope was that improvements in GC technology would make this a non-issue in the future, but we're in the future and it hasn't turned out that way: even with vast amounts of money that have been spent on JVM GCs, garbage is still often an issue in Java. And this has given GC in general a bad name IMO quite unfairly. It just happens that Java simultaneously popularized GC and gave it a bad name by having a design that made it virtually impossible for the GC to keep up with the amount of garbage that was generated.

It is entirely possible to design a garbage collected language that doesn't generate so goddamned much garbage — and this works much, much better because a relatively simple GC can easily keep up. Julia and Go are good examples of this. Julia uses immutable types extensively and by default, while Go uses value semantics, which has a similar effect on garbage (but has other issues). With a language design that doesn't spew so much garbage, if you only care about throughput, a relatively simple generational mark-and-sweep collector is totally fine. This is what Julia has. If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

Persistent data structures are a whole different question that I haven't really spent much time thinking about. Clojure seems to be the state of the art there but I have no idea if that's because of the JVM or despite it.

The biggest struggle Julia's GC has is that in multi-threaded workloads, it sometimes isn't aggressive enough to reclaim memory leading to OOM.
This is very legit issue that the compiler team has their eye on and plans to work on.
fyi, I'm oscardssmith on most other channels.
Hi, Oscar! Nice user name :P
It's my backup username discovered when I was 10 or so, and I wanted something that wasn't my name and would be available everywhere.