| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kazinator 3163 days ago

I can even do that better in Lisp over C structs. In my own home-grown dialect. Suppose we have

  struct foo {
    int count;
    char name[16]; /* not null terminated! */
  };

REPL: define an alias name foo for the type with typedef:

  1> (typedef foo (struct foo (count int) (name (array 16 char))))
  #<ffi-type (struct foo (count int) (name (array 16 char)))>

Now put an instance of the Lisp struct into a binary buffer using this FFI type:

  2> (ffi-put #S(foo count 42 name "ABCDABCDABCDABCD") (ffi foo))
  #b'2a00000041424344 4142434441424344 41424344'

Now, recover a new Lisp struct instance from this binary struct:

  3> (ffi-get *2 (ffi foo))
  #S(foo count 42 name "ABCDABCDABCDABCD")

No problem; the FFI type system knows that an "array of char" is different from a null terminated string, and can make it correspond to a Lisp string in both directions.

Now, for fun, let's poke a zero byte into that buffer:

  4> (set [*2 8] 0)
  0
  5> *2
  #b'2a00000041424344 0042434441424344 41424344'

There it is. Now decode:

  6> (ffi-get *2 (ffi foo))
  #S(foo count 42 name "ABCD\xDC00;BCDABCDABCD")

What's that? My UTF-8 decoder treats the 00 as an invalid byte, and maps it into the surrogate pair range U+DCXX. The otherwise optional semicolon was output because the next character in the string is a hex digit.

If that U+DC00 is encoded back, it will reproduce the null byte:

  7> (ffi-put *6 (ffi foo))
  #b'2a00000041424344 0042434441424344 41424344'