Hacker News new | ask | show | jobs
by kiruwa 3671 days ago
All hail strpbrk()!

Do you have any defense of that? I'd be interested to read it.

I think every article on coding advice I've read for the last decade favors long and descriptive function/variable names.

3 comments

Defense of strpbrk and strspn:

The "break" and "span" terms were familiar from a time when more people knew the Snobol language, which has two frequently useful pattern matching operators: BREAK and SPAN. Snobol's BREAK(S) pattern matching operator matches the input up to but not including the single-character match for any of the characters in set S. The set S delimits or "breaks" the sequence. SPAN(S) matches a sequence of one or more characters from the set S.

strpbrk tries to fit "string" "pointer" and "break" into a symbol that is different in the first six characters. It was once a common linker limitation that only the first six characters of external symbols were stored.

Actually the C function which corresponds to the concept of BREAK is strcspn (complemented span), because this gives the (length of) the range characters up to the first match in the set. That is to say, strcspn could have been called strbrk! Then we would have had strspn and strbrk as a complementing pair. In any case, the strpbrk function points one character past the substring indicated by this function; giving a pointer to the breaking character. I think, the following equivalences hold:

   strcspn(str, set)        <-->  strpbrk(str, set) - str;
   str + strcspn(str, set)  <-->  strpbrk(str, set);
which further supports strbrk as a good name for strcspn.

Trivia: break and span appear as functions in the Scheme SRFI 1, by Olin Shrivers [1998]. I think these correspond to the take-while and drop-while in Clojure and imitations thereof like the Emacs Lisp dash library.

http://srfi.schemers.org/srfi-1/srfi-1.html#span

Sure, within the context of this generation of string-handling functions, the particular example of strpbrk makes sense.

But there's a reason that this sort of name has been left behind in modern APIs.

Yes; that reason being that we don't have the same strict limits on number of characters in an external name, and we simply have too many API's to remember to be able to give six character mnemonics to them.

Short names are not left behind in core languages. For instance, a function that gives the length of a list, string or other sequence is often called length or len, and not length_of_sequence or whatever.

Arc, in which HN is programmed, has reduced "lambda" to "fn". "fn" is the same sort of shortening as using "pbrk" for "pointer to break".

Ruby has shortened "print" to "p".

For the basic core of a language, shortening names is good. When you're reading code, the short names by their very brevity tell you "I'm a thing in the core language, and not some external API to some add-on lib", which also has connotations of "I might be useful in many contexts; it may be worth it to learn about me and remember me".

> For the basic core of a language, shortening names is good.

You do have to admit the slight irony of the shortened string-handling function names in C, given this fact.

You mean, the irony that the names are short, but mountains of code are dedicated to getting string handling in C, and still getting some of it wrong?
strpbrk() is a bit too short for my taste, but I'll choose that over FindFirstOccurrenceOfAnyCharIn(). There's "long and descriptive", and there's "too long".

Think about it: there's a reason why human languages are filled with pronouns, because if you use the full name for everything, communication will soon become tedious and it will become actually harder to understand.

Any desire for a long and descriptive name should be tempered with a matching desire for conciseness. Otherwise you end up with names like MaybeUpdateDisplayParameterListForValidation. Throw twenty of these names onto a screen, and I have no idea what the hell is going on: I can't even figure out which names are the same at a glance.

Human languages also solve it by taking into account context - and so do PLs. For example, if your language can overload functions based on argument types, then "Char" is already redundant.
The problem with these examples of verbose functions is not that they're verbose, it's either that they encompass too much functionality or poorly describe what they do.
Descriptive function names - yes, of course. With one function call per line.

But long variable names?!? There is no place for such an abomination under this sun.

Including class variables? Their context is far larger than the immediate code you're looking at, longer/descriptive names seem basically required there, as they have essentially similar scope to functions. (Same logic applies to making them longer) You'll note... that was my original example.
> Including class variables?

You mean field names? A paradigm where you have such a thing is _awful_. Seriously. And this is exactly one of the reasons why it is awful and why it leads to an unreadable and unmaintainable code.

Are we anti-OOP then?

And no, I don't believe you. Making your variable names more descriptive does not make your code less readable. Even when it's not strictly necessary, it's not harmful to comprehension UNLESS you refuse to move from a completely outdated line width.

Some people argue that it takes longer to type and edit. But as we all know, we spend far more time reading than writing code, and modern editors (like vim!) have solved this non-problem anyways.

It takes longer to read, and this is what is important.

And, yes, OOP is a filth.

I don't think either of these are true, but now I understand where you're coming from.
Long variable names are great. When used in conjunction with redundant variables for intermediate computation results (effectively naming those results), they can make a terse but complex expression much easier to read and understand, to the point where something that required a comment no longer does so.