Hacker News new | ask | show | jobs
by chubot 1256 days ago
So I'd say Oil's collector is highly unusual and not applicable most problems! (I now think that "every GC is a snowflake" -- it's such a multi-dimensional design space)

It's unusual because it's a precise collector in C++, and what I slowly realized is that that problem is basically impossible for any non-trivial software, without changing the C++ language itself :)

It seems like that hasn't happened, despite efforts over decades. I added this link about C++ GC support to the appendix, which also explains our unique constraints.

Garbage collection in the next C++ standard (Boehm 2009)

https://dl.acm.org/doi/abs/10.1145/1542431.1542437

http://www.oilshell.org/blog/2023/01/garbage-collector.html#...

---

The reason that precise GC can work for Oil is because it's a shell that links with extremely little 3rd-party code, and has relatively low perf requirements. We depend on libc and GNU readline, just like bash. And those libraries are basically old-school C functions which are easy to wrap with a GC.

(Also as Aidenn mentioned, shells use process-based concurrency, which means we don't have threads. The fact that it's mostly generated C++ code is also important, as mentioned in the post)

---

The funny thing is that one reason I started this project is because I worked with "big data" frameworks on clusters, but I found that you can do a lot on a single machine. (in spirit similar to the recent "Twitter on one machine post" https://news.ycombinator.com/item?id=34291191 )

I would just use shell scripts to saturate ~64 cores / 128 G of RAM, rather than dealing with slow schedulers and distributed file systems.

But garbage collectors and memory management are a main reason you can't use all of a machine from one process. There's just so much room for contention. Also the hardware was trending toward NUMA at the time, and probably is even more now, so processes make even more sense.

All of that is to say that I'm a little scared of multi-threaded GC ... especially when linking in lots of third party libraries.

And AFAIK heaps with tens or hundreds of gigabytes are still in the "not practical" range ... or they would take a huge amount of engineering effort

---

But of course there are many domains where you don't have embarrassingly parallel problems, and writing tight single- or multi-threaded code is the best solution.

Some more color here: https://old.reddit.com/r/oilshell/comments/109t7os/pictures_...

I wonder if Clasp has any support for multi-process programming? Beyond Unix pipes, you could also use shared memory and maybe some semaphores to synchronize access, and avoid copying. I think of that as sort of "inverting" the problem. Certain kinds of data like pointer-rich data is probably annoying to deal with in shared memory, but there are lots of representations for data and I imagine Lisps could take advantage of some of them, e.g. https://github.com/oilshell/oil/wiki/Compact-AST-Representat...

1 comments

Thank you. We do precise GC in C++. I wrote a C++ static analyzer in Lisp that uses the Clang front end and analyzes all of our C++ code and generates maps of GC-managed pointers in all classes. We precisely update pointers in thousands of classes that way. We also use it to save the system's state to a file or relinked executable so we can start up quickly later. Startup times using that are under 2 seconds on a reasonable CPU.
Ah OK very interesting ... So the tracing is precise, but what about rooting? From your other comment it sounded like that it can be imprecise. But maybe it's for dynamic linking where you don't have source access?

In any case, I would imagine embedding a C++ compiler at runtime does open up a lot more options!