Hacker News new | ask | show | jobs
by kazinator 1482 days ago
The RAM usage is tied to the size of the reachable set, plus some slack filled with garbage that depends on how you tune the garbage collection thresholds.

By today's standards, the RAM usage isn't necessarily huge.

Here is the TXR Lisp compiler recompiling stdlib/compiler.tl -> stdlib/compiler.tlo, as seen in top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
  11488 kaz       20   0   17800  14804   2964 R  98.1  0.7   0:07.07 txr   
                           ^^^^^  ^^^^^
On the order of a bash session. It's a lot of RAM by 1982 standards at the institution level, and even 1992 standards at the consumer level, but today it means nothing.

You can easily see a Bash process a footprint on that order.

It could be reduced by tuning the garbage collector. One way to do that is to build for less memory use (useful for embedded). Here it is with txr rebuilt using #define CONFIG_SMALL_MEM 1 in config.h:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
  12838 kaz       20   0   11964   9768   3140 R  99.0  0.5   0:10.39 txr         
Bash footprints for comparison:

  $ ps aux | head -1 ; ps aux | grep bash
  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  kaz       1093  0.0  0.1   9288  2132 pts/2    Ss+  May15   0:01 -bash
  kaz       2833  0.0  0.0   8904  1992 pts/0    Ss   May15   0:00 -bash
  kaz       3509  0.0  0.2  10532  4988 pts/1    Ss+  May15   0:28 -bash
  kaz       7898  0.0  0.1   8968  2212 pts/3    Ss+  May20   0:00 -bash
Lists are used for everything: the compiler produces a list-based assembly code which is used from then through assembly. There is an optimizer which divides it into basic blocks, which are objects put into a graph, but the instructions still being lists. The peephole pattern matching is done on lists. The compiler does not bother using destructive append (nconc) for stitching together fragments of code; just straight garbage-generating appends. Same with most of the other rewriting that happens later.

In a computer in 1960, your compiler would be capped to the physical memory available. That would be the RAM use. The garbage collector would have to be called whenever the memory is exhausted, or else the show would stop. A successful compilation would demonstrate that the compiler needed no more memory than what the machine has. The closer its actual usage would be to the available memory, the longer it would take, due to the frequent garbage collections required to stay afloat.

I'd say that given people's expectations today, shaped by experiences with everyday software, they likely greatly overestimate how much RAM you need for Lisp compiling.

1 comments

Appendix: the VM footprint number doesn't really give us a breakdown since it includes executable and shared libs mappings. I ran the second compile again with the memory-optimized build, and this time captured a pmap.

Here you can also see the full command, confirming the compile job:

  $ pmap 24087
  24087:   ./txr --in-package=sys --compile=stdlib/compiler.tl:stdlib/compiler.tlo.tmp
  08048000   1660K r-x-- txr
  081e7000      4K r---- txr
  081e8000     12K rw--- txr
  081eb000    124K rw---   [ anon ]
  08c03000   6188K rw---   [ anon ]
  b7c5e000      8K rw---   [ anon ]
  b7c60000   1876K r-x-- libc-2.27.so
  b7e35000      4K ----- libc-2.27.so
  b7e36000      8K r---- libc-2.27.so
  b7e38000      4K rw--- libc-2.27.so
  b7e39000     12K rw---   [ anon ]
  b7e3c000    116K r-x-- libz.so.1.2.11
  b7e59000      4K r---- libz.so.1.2.11
  b7e5a000      4K rw--- libz.so.1.2.11
  b7e5b000     28K r-x-- libffi.so.6.0.4
  b7e62000      4K r---- libffi.so.6.0.4
  b7e63000      4K rw--- libffi.so.6.0.4
  b7e64000     12K r-x-- libdl-2.27.so
  b7e67000      4K r---- libdl-2.27.so
  b7e68000      4K rw--- libdl-2.27.so
  b7e69000     36K r-x-- libcrypt-2.27.so
  b7e72000      4K r---- libcrypt-2.27.so
  b7e73000      4K rw--- libcrypt-2.27.so
  b7e74000    156K rw---   [ anon ]
  b7e9b000   1024K r-x-- libm-2.27.so
  b7f9b000      4K r---- libm-2.27.so
  b7f9c000      4K rw--- libm-2.27.so
  b7fba000      8K rw---   [ anon ]
  b7fbc000     12K r----   [ anon ]
  b7fbf000      8K r-x--   [ anon ]
  b7fc1000    152K r-x-- ld-2.27.so
  b7fe7000      4K r---- ld-2.27.so
  b7fe8000      4K rw--- ld-2.27.so
  bf8d9000    200K rw---   [ stack ]
   total    11700K
You can see the 11700K fairly closely matches the earlier VIRT figure of 11964.

Anyway, look at the [ anon ] heap area: it's like 6-something megs. That's it. That's where all the dynamic Lisp stuff is. All the predefined symbols and function bindings and whatnot, and all the objects allocated during the compile job.

libz is new; I integrated libz into TXR in just the most recent release. It happens to be number 277, so I code named it (L)Z77.