Hacker News new | ask | show | jobs
by tatterdemalion 4061 days ago
This applies more generally than just to logs. I love Unix, but "everything is text" is not actually great. It's better that Unix utils output arbitrary ASCII than that they output arbitrary binary data, but it's obvious why people don't do serious IPC 'the Unix way.' Imagine if instead of exchanging JSON, or ProtoBufs, or whatever, your programs all exchanged text you had to regex into some sort of adhoc structure. So why do we manage our logs and our pipelines that way? There's no actual reason that the terminal couldn't interpret structured data into text for us so that, in the world of intercommunicating processes on the other side of the TTY, everything is well-structured, semantically comprehensible data.
3 comments

This is the PowerShell argument. It's a step in the right direction, but it needs the tooling and user community to come along with it.

The advantage of the traditional unix pipe manipulation tools is that most of them are simpler and faster than regex.

> There's no actual reason that the terminal couldn't interpret structured data into text for us so that, in the world of intercommunicating processes on the other side of the TTY, everything is well-structured, semantically comprehensible data.

I think you just described PowerShell (or things that follow down the same path, e.g. TermKit) ;-)

JSON is text!

Text is not synonymous with unstructured.

Of course JSON is encoded in Unicode, making it "text," but when it is said that text is the universal protocol of Unix, it means that the only guarantee a well-behaving Unix utility can make is that it will output ASCII. You cannot leverage the further structure of JSON or any other protocol because utilities that interpret JSON do not compose with those many Unix utilities which emit non-JSON data.

Only entropic bits are truly "unstructured data." The question is one of how much semantic structure you can rely on in the data you are processing, which is a continuum.