|
This is very cool, regardless of how serious it was intended to be taken. Before base-64 encoders/decoders became more common as preinstalled commands in the environments I found myself on, I wrote a base64 utility in mostly pure POSIX shell: https://25thandClement.com/~william/2023/base64.sh
If this project had existed I might have opted to compile my C-based base-64 encoder and decoder routines, suitably tweaked for pnut's limitations.I say base64.sh is mostly pure not because it relies on shell extensions, but because the only non-builtins it depends on are od(1) or, alternatively, dd(1) to assist with binary I/O. And preferably od(1), as reading certain control characters, like NUL, into a shell variable is especially dubious. The encoder is designed to operate on a stream of decimal encoded bytes. (See decimals_fast for using od to encode stdin to decimals, and decimals_slow for using dd for the same.) It looks like pnut uses `read -r` for reading input. In addition to NULs and related raw byte issues, I was worried about chunking issues (e.g. truncation or errors) on binary data, e.g. no newlines within LINE_BUF bytes. Have you tested binary I/O much? Relatedly, how many different shell implementations have you tested your core scheme with? In addition to bash, dash, and various incarnations of /bin/sh on the BSDs, I also tested base64.sh with Solaris' system shells (ksh88 and ksh93 derivatives), as well as AIX's (ksh88 derivative). AIX had some odd quirks with pipelines even with plain text I/O. (Unfortunately Polar Home is gone, now, so I have no easy way to play with AIX; maybe that's for the better.) |
Because we use `read -r`, we haven't tested reading binary files. Fortunately, the shell's `printf` function can emit all 256 characters so Pnut can at least output binary files. This makes it possible for Pnut to have a x86 backend for the use of reproducible builds.
Regarding the use of `read`, one constraint we set ourselves when writing Pnut is to not use any external utilities, including those that are specified by the POSIX standard (other than `read` and `printf`). This maximizes portability of the code generated by Pnut and is enough for the reproducible build use case.
We're still looking for ways to integrate existing shell code with C. One way this can be done is through the use of the `#include_shell` directive which includes existing shell code in the generated shell script. This makes it possible to call the necessary utilities to read raw bytes without having Pnut itself depends on less portable utilities.