Hacker News new | ask | show | jobs
by CodesInChaos 59 days ago
I never understood why outputting unescaped data is viewed differently from generating unenclosed html.

Like why doesn't `println` in a modern language like rust auto-escape output to a terminal, and require a special `TerminalStr` to output a raw string.

3 comments

I think the problem is that 1) You want to be able to write arbitrary bytes, including shell escape sequences into files. 2) You don't want to accidentally write terminal escape sequences to stdout. 3) Stdout is modeled as a file.

Consider cat. It's short for concatenate. It concatenates the files based to it as arguments and writes them to stdout, that may or may not be redirected to a file. If it didn't pass along terminal escapes, it would fail at its job of accurate concatenation.

Now I don't mean to dismiss your idea, I do think you are on the right track. The question is just how to do this cleanly given the very entrenched assumptions that lead us where we are.

> that may or may not be redirected to a file

This is usually knowable.

It's a different question whether cat should be doing that, though – it's an extremely low level tool. What's wrong with `less`? (Other than the fact that some Docker images seem to not include it, which is pretty annoying and raises the question as to whether `docker exec` should be filtering escape sequences...)

Besides less having a lot of code (features, bloat) and therefore attack surface (some less honor LESSSECURE=1 which on some OS these days involves some pretty tight pledge(2) restrictions), or that some vendors have configured less by default to automatically run random code so you can automatically page a gziped file or an attacker can maybe run arbitrary code (whoops!). Besides those issues, and any others I do not know about? Nothing.
Docker images usually have "more" installed. Not quite as useful as "less", but usable enough.
Sometimes you don't want to open stuff in a pager.
To truly fix this would require revisiting of some very old fundamentals.

The C0 control set (ASCII 0x00 to 0x1F) contains all sorts of esoteric functions, most of which are generally unused, and only a few of which are useful and could be implemented at a higher-level. ESC sequences are only part of the problem.

And this also applies not just to terminals, but to systems programming as well. None of these have any business in e.g. filenames, but it's all commonly permitted. Some systems do forbid them, and it should IMO be universal.

If we really want to fix this, then we would develop a character encoding that strips out all control characters entirely, including LF and CR, and have text be nothing but graphic text characters. It's so entrenched and convenient that it's difficult to see that happening. But I do think routine stripping of all control characters in situations that don't require them would be good for security.

Because terminal is not a browser but a screen. Outputting text isn't supposed to trigger anything aside from changing what's on screen.
This is broadly correct, but not entirely. Terminals have historically had additional capabilities, be that ringing a bell (BEL) or outputting to a line printer. There are escape codes dedicated to doing file/tape access and running system commands. Not in wide use, but they do exist. See ECMA-48 for some examples from the '80s.