Hacker News new | ask | show | jobs
by waltman 3163 days ago
COBOL structures are better for fixed-length strings than C structs are because C really prefers strings do be null terminated. In C you've got to specify the length of each string, whereas in COBOL the compiler takes care of that for you.
1 comments

I can even do that better in Lisp over C structs. In my own home-grown dialect. Suppose we have

  struct foo {
    int count;
    char name[16]; /* not null terminated! */
  };
REPL: define an alias name foo for the type with typedef:

  1> (typedef foo (struct foo (count int) (name (array 16 char))))
  #<ffi-type (struct foo (count int) (name (array 16 char)))>
Now put an instance of the Lisp struct into a binary buffer using this FFI type:

  2> (ffi-put #S(foo count 42 name "ABCDABCDABCDABCD") (ffi foo))
  #b'2a00000041424344 4142434441424344 41424344'
Now, recover a new Lisp struct instance from this binary struct:

  3> (ffi-get *2 (ffi foo))
  #S(foo count 42 name "ABCDABCDABCDABCD")
No problem; the FFI type system knows that an "array of char" is different from a null terminated string, and can make it correspond to a Lisp string in both directions.

Now, for fun, let's poke a zero byte into that buffer:

  4> (set [*2 8] 0)
  0
  5> *2
  #b'2a00000041424344 0042434441424344 41424344'
There it is. Now decode:

  6> (ffi-get *2 (ffi foo))
  #S(foo count 42 name "ABCD\xDC00;BCDABCDABCD")
What's that? My UTF-8 decoder treats the 00 as an invalid byte, and maps it into the surrogate pair range U+DCXX. The otherwise optional semicolon was output because the next character in the string is a hex digit.

If that U+DC00 is encoded back, it will reproduce the null byte:

  7> (ffi-put *6 (ffi foo))
  #b'2a00000041424344 0042434441424344 41424344'