Hacker News new | ask | show | jobs
by anon1385 4269 days ago
I'm honestly not sure if your post is meant to be satire or not.

>lists: ls -1 | wc -l

The computer has taken a real array (probably an array in C), joined all the items together into one big string using magical characters as dividers, and then split it again on those magic characters to try and reconstruct the metadata that it threw away. I think the problem is pretty obvious and well known.

>would unix be better if it were cwd.files.count

Well, at least that is going to give you the correct result. Correctness seems like it should be pretty important, no?

Are you really arguing for shells being easier to learn using an example of a complicated command with 4 pipes, 2 different quoted strings, several single letter arguments, and that requires implicit knowledge about the structure of the output from several commands? Compared to a much shorter, simpler, type safe, and self documenting bit of code?

Also, yes my original post was hyperbolic. That was because I was responding to a histrionically overwrought post claiming that unix is perfect.

2 comments

> Well, at least that is going to give you the correct result. Correctness seems like it should be pretty important, no?

The devil is in the details. In OP's example, is "files" a field of "cwd," and "count" a field (or getter) of "files?" Is "filter" a method of "processes" and "map" a method of the resulting (implicit) list returned by "filter"?

If the answer to any of these is "yes," then you will find yourself needing to implement these fields and methods (and probably others) for each OS object. The "filter" in "processes" necessarily has a different implementation from "filter" in "files", since despite having the same external interface, they both operate in different contexts on different internal state (i.e. a process object is not a file object).

Contrast this with the UNIX approach, where the "filter" and "map" implementations (i.e. grep, awk, sed, tr) exist independent of OS-level objects (i.e. processes, files, semaphores, sockets, etc.) and their state, allowing them to be developed and improved independently of one another and the data they operate on.

You want there to be some notion of type safety and structure in IPC. This can already be achieved: simply rewrite your programs to communicate via asn.1, json, or protobufs, or some other common structured marshalling format. You can have ls send wc an array of strings, instead of having wc guess which bytes are strings by assuming that whitespace delimits them.

However, upon doing this, you will find that you will only be able to use wc with programs that speak wc's protocol. Now if you're lucky, you can convince everyone who wants to send data to wc to use the new protocol. If you're unlucky, you'll instead end up with a bunch of programs that only implement it partially or with bugs. If you're really unlucky, there will also be competing versions of the wc protocol. Moreover, what about programs that wc will need to pipe data to? wc will need to know how to communicate with all of them as well.

My point is, if we go the route of imposing structure and types on IPC, the only thing you'll have to show for it are O(N^2) IPC protocols for N programs, which they all must implement perfectly (!!) to get type safety. Want to write a new program? Now you also have to write O(N) additional IPC protocols so it can communicate with the others.

Maybe you can additionally mandate that each program speaks the same IPC protocol (i.e. there is One True Record for piping data in, and One True Record for piping data out). But, if this IPC protocol encompasses every possible use-case, how is it any different than a byte stream?

So your argument is that ls is not efficient enough in its implementation, but that you would suggest replacing all of that with an object model that implements the equivalent of Ruby's enumerable. Got it.

Shell is easy to learn because people innately understand "and then do this with it". You can start with ls, get to ls -1, then think, I want to count these, and get to wc -l.

Yeah, there are exceptions -- e.g., files with newlines in their name -- and yeah, the interface could use some cleanup. But pedagogically, I can assure you that teaching people shell is easier than teaching them map-reduce.

>So your argument is that ls is not efficient enough in its implementation

My complaint wasn't about the efficiency of ls, it was the fact that valuable information that is required for correctness is thrown away to achieve compatibility with the unix 'stream of text' interface and the attempt to recover that information leads to incorrect results. The paradigm is just fundamentally broken.

>you would suggest replacing all of that with an object model that implements the equivalent of Ruby's enumerable. Got it.

I don't even like Ruby at all (ironically, I find its grammar far too complex and shell-like to be able to parse in my head), so I've no idea what you are talking about. You seem to be assuming that everybody who dislikes the shell must be some strawman hipster.

>there are exceptions -- e.g., files with newlines in their name

I honestly find it extremely bewildering that any programmer would see that as being acceptable. It's not just that it fails to give the correct result, it fails silently. Silent data corruption is surely just about the worse class of bug.

>But pedagogically, I can assure you that teaching people shell is easier than teaching them map-reduce.

I presume you mean the functional ideas of map, reduce, and filter, not MapReduce ( https://en.wikipedia.org/wiki/Map-reduce ). The latter is irrelevant to the discussion.

My experience is the exact opposite. I found understanding the concepts of map and filter trivial. If you can understand a loop you can understand them. Reduce/fold isn't hard to understand either, although a bit tricker to make use of. Your example didn't use reduce anyway. Map and filter are typically much easier to use and reason about than a for loop in C, or a chain of commands in a shell script.

The shell is an absolute nightmare to learn. I have tried to learn to use it numerous times of the last decade or so, and I have always forgotten it the next time I come to do anything in the shell. The amount of knowledge you need to actually do anything is huge (the awk language, obscure and terse command names, complex regexes, memorising a bunch of command flags, memorising the output format of commands - usually a format designed for displaying to users rather than machine parsing, the shells ridiculously complex grammar, how to escape things etc etc) Your example illustrates that. It would have taken me 20 mins at least to put together that line of code you gave. Also, it's not like you escape having to understand concepts like map and filter. If you don't understand them (not necessarily by name) then you won't be able to write the line of unix commands you gave.

>Shell is easy to learn because people innately understand "and then do this with it".

People might find the concept of piping data easy to understand (I'm not convinced they do to be honest), but that alone won't do them much good because as your examples showed, you always need to run a bunch of complex and obscurely named commands, regex, or awk on the data to make the next command able to understand it.

>> there are exceptions -- e.g., files with newlines in their name

> I honestly find it extremely bewildering that any programmer would see that as being acceptable. It's not just that it fails to give the correct result, it fails silently. Silent data corruption is surely just about the worse class of bug.

Yes! We should strive to build our software on solid, non-leaky abstractions as much as possible, so that exceptions like a filename with an odd character in it just don't exist. Until we reach that point, computers will continue to frustrate their users for no good reason.

I didn't see any argument about performance. I only saw an argument about correctness.