Hacker News new | ask | show | jobs
by Rochus 1436 days ago
It is true, that there is uncollected garbage in the original Xerox ST80 image. I've built some tools to analyze the image and also a VM which can be interrupted at any time to analyze the current state of the image (see https://github.com/rochus-keller/Smalltalk).

There are two zombie processes (OID 6662 and 19ba). There are also a couple of BlockContext and MethodContext which have a nil sender and a reference to an unknown method, but which are still referenced from somewhere (i.e. the collection is prevented, even with mark & sweep). E.g. OID 79a2 of class BinaryChoice. I have a full list if anybody is interested.

2 comments

It's things like this that always made me look a bit askance at Smalltalk. It feels a little like "Document_Final_FINAL_v3_(2).docx", a little like a dirty old whiteboard where you can still see outlines of notes from last year. Might not be a fair assessment, but as an outsider I've always felt this way about image-based systems.
Deploying an application was always an adventure. You could never be completely sure that it didn't contain things it shouldn't.
( cont. from https://news.ycombinator.com/item?id=31999774 )

> … no algorithm that can identify all dead code reliably…

Why were you so concerned with the removal of all dead code?

Compared to "… nobody had the source code…" it seems like a minor issue.

( cont. )

imo the more usual concern would be mistakenly removing code that was not dead.

Without a design document, we might think there would be no senders of #factorial without understanding that the intention was to invoke that method on the command line.

For example,

    $ cat fact.st
    Stdio stdout 
        nextPutAll: 100 factorial printString; 
        nextPut: Character lf.!
    SmalltalkImage current snapshot: false andQuit: true!

    $ bin/pharo --headless Pharo10-SNAPSHOT-64bit-502addc.image fact.st
    93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
So instrumenting methods and logging — https://stackoverflow.com/a/162719
For some meaning of "contain things it shouldn't":

Given that the base image we started working with didn't "contain things it shouldn't" why could we not be completely sure that base image + source code fileIn didn't "contain things it shouldn't" ?

First, as most other companies using ST we were not working on the original image, but on a company-specific one that had been in use for a long time and contained a lot of company-specific stuff. On the other hand, there were also things in the original images of the commercial STs that one would or should not ship with the product.
Firstly:

company-specific base image = base image + company-specific source code fileIn

Why could we not be completely sure that company-specific base image + source code fileIn didn't "contain things it shouldn't" ?

Secondly:

> … also things in the original images of the commercial STs that one would or should not ship with the product.

That doesn't seem to be an example of "You could never be completely sure that it didn't contain things it shouldn't."

That seems to be an example of you being completely sure.

Not sure what you're up to; there were things in the image nobody had the source code (or the current version) anymore, but even with the source code it was a nightmare; it would undoubtedly have been less bad had I then had tools like I recently built for ST80 and the knowledge gained with them.
Which is why image trimming tools did exist.
I had Envy in later projects, which was very useful, but you simply had to trust it, which is not the same as to be completely sure.
Is there uncollected garbage in the Smalltalk Squeak 6.0 image?
I have no information about this. Squeak uses a different image format, incompatible with my tools.