Hacker News new | ask | show | jobs
by klibertp 4255 days ago
I have one question. I learned some J some time ago, but never really talked about it with anyone, and so my programs - a few lines' scripts, really - were always written with long, meaningful variable names. I read your explanation and every time you wrote "we don't know what it is yet" I wondered "why the heck isn't it just appropriately named?". I mean, why is 'c' better than something like 'nl_pos' for example? I get it that reading J, K or APL programs requires some serious work and I'm ok with that, but why would I need to burden my short term memory with one- or two-letters identifiers on top of that?

This is a honest question and I feel like there is some upside to those names I just keep missing. As I said, I'm not fluent in J, but while learning it I wrote and read quite a bit of it, and I only made it through some longer (like, longer than half a line!) examples thanks to a sheet of paper and sheer determination. I often was going through a fairly complicated expression and was starting to see what is it about, only to be stopped by an 'x' or 'c': I then had to go back a couple of lines, read 'x' definition again, and retry parsing that line from the beginning, hoping that I will remember what 'x' is this time. I started taking notes for this reason (it worked quite well I think).

Anyway, you seem to have no problems reading such code, so I figured I'd ask you: why and what is this style of naming good for, and what one needs to do to master it?

3 comments

j is abstract. It seems like it should be easier if it is concrete, but I find that this tends to lead to assumptions and encourages scanning instead of reading.

When I sit down to a program, I have an idea about what I want to do. I don't have home/end keys on my keyboard and I note that pressing Fn-Left and Fn-Right is annoying because I don't usually push them (I use emacs sometimes). So when I sit down to edit I want to add emacs keys.

I can see at http://kparc.com/z.txt that hx handles home/end, and that cX is control-X, so I think it should be something like:

    ca:{hx 0 -1};ce:{hx 0 1}
Now I look at the characters spent: Is it possible I could say this simpler? I don't think so, but I'd love someone to tell me.

How about delete? I think what I want to do is select the next character and remove it. So where is the cursor? Well I remembered that hx moves the cursor, so I read it's definition:

    hx:{L i+d*x}
Okay, so I remember d is "dimensions" and "i" is something, and L I haven't seen yet. So I go look at L:

    L:{K@B/0|x&d-1}
Now I have two new words: K and B to look up:

    B:{c[x]&y+b x}
    K:{J(;|\(*k),)[H]x}
Okay, B is straightforward: b is indexed by line, so b x returns the offset of line x. We add this to y (which is the second value of |x&d-1) and take the min of it and the offset for c. I might think at this point B converts the x/y coordinate into a linear offset. But what about K?

K is using J and H (but it turns out we only need J):

    J:{k::2#0|x&-1+#a}
And here we see the definition of k: x&-1+#a is the smallest of x and the second to last character in the file (a), and the max of that and zero (0|). This reads like "bounds check x to the shape of a". Taking 2 from that simply takes two values and assigns them to k so if x is only one-value k will still have two.

k is the cursor selection.

From here, I can make an attempt at implementing delete. My first attempt looked something like:

    cd{a::((0,k)_a),(((k+1),(#a)-(k+1)))_a}
Oh! But I've seen a lot of these terms before! Maybe I can do better. I note that kx is documented as the callback for keystrokes http://kparc.com/z.txt and can come up with:

    cd:{K j+!2;kx""}
This seems about as simple as I can make it. I'd like to see someone do better, but knowing that K is a setter for the cursor selection and j is the offset of the cursor, then j+!2 simply returns offset and offset+1; K will select it and kx"" will remove it.

Now maybe if k were called cursorSelection I might have gotten my first cd faster, but I would've missed the opportunity to see how these functions and variables were interconnected and I might not have written the second one.

I noticed however, that not scrolling really helped this exercise. I just moved my eyes around, and jotted a few symbols down on my notepad. I feel like if I had to scroll or switch windows I would not have been able to do this.

As for your last question: What does one need to master it? For now, I would say practice. Try writing a program in as dense a manner as possible. Remove as much redundancy as you can. Read it and re-read it until you feel like it can be no more abstract.

I am working on a much better answer, but I hope that one will do for now.

My guess is that the language's "day job" uses many similar-but-ever-so-slightly-different intermediate variables which are difficult to descriptively name in any manner that facilitates understanding faster than just re-reading the definition. Long descriptive names have a concrete cost: they impair your ability to recognize visual patterns / turn common compositions into "pictographs" and they don't refine well: instead of gaining an index and adding a bit to the definition the author has to come up with a bundle of new similar-but-slightly-and-meaningfully-different names from which someone else (including their future self) will be able to reverse-engineer the definition.

The shortcut of using descriptive names just doesn't have the same ROI in all kinds of code. The first time I was forced to abandon my descriptive-naming ways was when I started writing finite element solvers. I don't think it's much of a stretch to believe that (some areas of) finance have the same cost/benefit profile. Since this is a desktop programming example it's relatively easy to come up with good descriptive names, so the short names are almost certainly a holdover.

Or it's just a macho thing. There's enough pixie dust floating around this press release that I wouldn't be surprised.

> why and what is this style of naming good for, and what one needs to do to master it?

It's bullshit. The letters are not letters, they're symbols. Replace them with JPGs of pokemon if you want. It's just as well if they were ancient hieroglyphs. If you forgot what they all meant, you have to read the code and keep track of what variable name contains what. Or make a note on some paper.

This is why one-letter variable names are an antipattern: when there's no rhyme or reason to what value is associated with what symbol (why is `c` the index of the newline and not `n`?), your brain isn't going to remember it, and you've instantly forgotten what that code does. And everyone after you needs to do the same thing: you need to re-remember how each line of code works each time you work with it. And more importantly, looking at one dense line of code provides no context as to what the code around it does, compared to C or Python where one function can give you a decent idea of what its use is.

It's even more of a nonsense issue because the actual name of the variable is just a number: every one-letter variable name maps to an integer. Why not support proper identifiers and replace them with unique integers at runtime? The same goes for whitespace: it doesn't need to be kept in the program at runtime, but makes the application infinitely more readable because units of logic can be grouped on their own line.

So to answer your question: no, this is not good for anything and no, you shouldn't try to master it because there are much more useful things that you could be doing with your life.

One reason for one-character names is the habit of these languages to sidestep naming problem. Naming things is one of the hardest problems in software. APL programs are ideally read as "variable X is ...", not "do this, then this, then this" - so appropriate names of variables could be something like "max of reduce by difference..." - hence the hardness of coming with good names.

Another reason is the same as in math. When you write equations on a whiteboard - the long-sought golden standard of expressiveness - you don't use long names - at best you encode names with subscripts, indexes etc. But names themselves are usually pretty short. APL languages use the same rationale.

Just like you need to read carefully every line of a math equation to understand what's going on, you have to read carefully each symbol of programs in APL family languages. It's unusual for programmers who got used to more help along the line - but the vocabulary of all such languages is quite short and doesn't extend that often. Another reason why APL could be used for teaching math.

I don't agree that this style of naming is not good for anything. Somehow majority of programmers in these languages agree with that. Regarding much more useful things -

"If you are interested in programming solutions to challenging data processing problems, then the time you invest in learning J will be well spent." (http://jsoftware.com/)