Hacker News new | ask | show | jobs
by aportnoy 1106 days ago
> In general lisps _can't_ print arbitrary objects in a form that can be read back in.

Hmm, why not?

2 comments

How would one print a function with a free variable? Consider this (but imagine there is surrounding code which defines y):

    (lambda () y)
How can we print this in such a way that preserves semantics?

You can’t just print the original source definition, as y would then be undefined (or I suppose it could be defined to be whatever y is in scope at the deserialization call site or something thereabouts, but that wouldn’t guarantee that the deserialized function is equivalent to the other with respect to how it behaves, despite the source being identical).

You can’t just inline the serialization of y, because the original function returns a particular reference, and there’s no guarantee that the rest of the dependent code will not expect a stable, unique (and possibly mutable/state-full) instance of this value.

I suppose you could, on the side, also serialize the entire program state (or rather, at least the stack and heap), but then things get tricky when any object refers to something like an opaque pointer (even if you have an extent you can safely copy, you’d have to handle issues like different virtual addresses being mapped) or file descriptors, etc.

But maybe all those challenges can be overcome — so I’ll ask: how would you go about serializing an arbitrary object in such a way that one could deserialize an equivalent object?

In the original LISP with dynamic scoping I think it would be fair to serialize that expression as is: "just use whatever value is bound to 'y' at the time of evaluation, if any".
In fact, the Racket dialect being discussed has a facility for this:

https://docs.racket-lang.org/web-server-internal/closure.htm...

It is correct that usually Lisps do not have a print-read consistency for every object, just objects that someone has decided to care about. Most Common Lisps are like this. However, note that many of them have image saving. Image saving traverses the heap and serializes every object. That includes lexical closures. Their code vector, environment vector(s): everything is traversed. It's just not a printed notation!

If some object isn't printable and someone wants it badly to be, it can be arranged somehow.

It boils down to pragmatics.

Foreign objects (FFI handles holding foreign pointers) are going to be tricky; you would need some hooking mechanism to try to do something meaningful.

Image saving already cannot do some things, like saving open network connections such that they come up connected in a restarted image.

Everything is an object that is either a cons cell or an atom. Associate the address of each object with a unique index.

Serialize each object as a pair (index, obj), where 'index' maps back to the original address where the object was stored, and 'obj' is a pair of indices if the object is a cons cell, or the appropriate representation (string literal, integer, etc.) if the object is an atom.

Then to deserialize allocate a memory location for every index and load the objects as is. Then replace the indices with the corresponding (new) addresses.

This definitely isn't "printing" in the original sense, just a serialization algorithm.
> How can we print this in such a way that preserves semantics?

Maybe try something like normalisation-by-evaluation, but if you go to look up a free variable, replace that lookup with a little bit of syntax tree instead?

I haven't worked on any of the classic lisp implementations but can speculate. I suppose the ratio of usefulness to difficulty of implementation doesn't work out.

Mutable state with serialise/deserialise either breaks the aliasing or needs some way to keep the referenced variables alive (nasty interaction with garbage collection) and to re-establish aliasing on deserialise.

Serialising a DAG to a tree tends to duplicate the previously shared nodes. There's then the question of whether deserialising the tree should re-establish the original sharing or some other sharing. Serialising a graph to a (finite) tree needs some notation to represent the cycles.

It looks to me like mutable state is the blocker but I'd be interested in other opinions.

Yeah, something like "a" in the below code wouldn't have a "natural" serialization I don't think:

  (define a (list "a"))
  (set-cdr! a a)