Hacker News new | ask | show | jobs
by wahern 702 days ago
This is very cool, regardless of how serious it was intended to be taken. Before base-64 encoders/decoders became more common as preinstalled commands in the environments I found myself on, I wrote a base64 utility in mostly pure POSIX shell:

  https://25thandClement.com/~william/2023/base64.sh
If this project had existed I might have opted to compile my C-based base-64 encoder and decoder routines, suitably tweaked for pnut's limitations.

I say base64.sh is mostly pure not because it relies on shell extensions, but because the only non-builtins it depends on are od(1) or, alternatively, dd(1) to assist with binary I/O. And preferably od(1), as reading certain control characters, like NUL, into a shell variable is especially dubious. The encoder is designed to operate on a stream of decimal encoded bytes. (See decimals_fast for using od to encode stdin to decimals, and decimals_slow for using dd for the same.)

It looks like pnut uses `read -r` for reading input. In addition to NULs and related raw byte issues, I was worried about chunking issues (e.g. truncation or errors) on binary data, e.g. no newlines within LINE_BUF bytes. Have you tested binary I/O much? Relatedly, how many different shell implementations have you tested your core scheme with? In addition to bash, dash, and various incarnations of /bin/sh on the BSDs, I also tested base64.sh with Solaris' system shells (ksh88 and ksh93 derivatives), as well as AIX's (ksh88 derivative). AIX had some odd quirks with pipelines even with plain text I/O. (Unfortunately Polar Home is gone, now, so I have no easy way to play with AIX; maybe that's for the better.)

1 comments

One of the example we include is a base64 encoder/decoder:

  https://github.com/udem-dlteam/pnut/blob/main/examples/compiled/base64.sh
It doesn't support NULs as you pointed out, but it's interesting to see similarities between your implementation and the one generated by Pnut.

Because we use `read -r`, we haven't tested reading binary files. Fortunately, the shell's `printf` function can emit all 256 characters so Pnut can at least output binary files. This makes it possible for Pnut to have a x86 backend for the use of reproducible builds.

Regarding the use of `read`, one constraint we set ourselves when writing Pnut is to not use any external utilities, including those that are specified by the POSIX standard (other than `read` and `printf`). This maximizes portability of the code generated by Pnut and is enough for the reproducible build use case.

We're still looking for ways to integrate existing shell code with C. One way this can be done is through the use of the `#include_shell` directive which includes existing shell code in the generated shell script. This makes it possible to call the necessary utilities to read raw bytes without having Pnut itself depends on less portable utilities.

Sorry, but since the very goal of base64 is to encode "uncomfortable" bytes, saying that your example doesn't work with uncomfortable bytes is like providing a fibonacci demo that only works with arguments less than 3, or a clock that only shows correct time twice a day.

I'd choose a different example to showcase pnut.

In the context of what it seems to be primarily attempting to achieve, assisting in the bootstrapping of more complex environments directly or indirectly dependent on C, I found the base64 example (more so the SHA-256 example in the same directory) quite interesting and evidence of the sophistication of pnut notwithstanding the limitations. And as was pointed out, it wouldn't be difficult to hack in the ability to read binary data: just swap in a replacement for the getchar routine, such as I've done with od. In fact, that ease is one of the most fascinating aspects of this project--they've built a conceptually powerful execution model for the shell that can be directly targeted when compiling C code, as opposed to indirection through an intermediate VM (e.g. a P-code interpreter in shell). It has it's limitations, but those can be addressed. Given the constraints, the foundation is substantial and powerful even from a utilitarian perspective.

When people discuss Turing completeness and related concepts one of the unstated caveats is that neither the concept itself, nor most solutions or environments, meaningfully address the problem of I/O with the external environment. pnut is kind of exceptional in this regard, even with the limitations.