Hacker News new | ask | show | jobs
by kazinator 3671 days ago
Defense of strpbrk and strspn:

The "break" and "span" terms were familiar from a time when more people knew the Snobol language, which has two frequently useful pattern matching operators: BREAK and SPAN. Snobol's BREAK(S) pattern matching operator matches the input up to but not including the single-character match for any of the characters in set S. The set S delimits or "breaks" the sequence. SPAN(S) matches a sequence of one or more characters from the set S.

strpbrk tries to fit "string" "pointer" and "break" into a symbol that is different in the first six characters. It was once a common linker limitation that only the first six characters of external symbols were stored.

Actually the C function which corresponds to the concept of BREAK is strcspn (complemented span), because this gives the (length of) the range characters up to the first match in the set. That is to say, strcspn could have been called strbrk! Then we would have had strspn and strbrk as a complementing pair. In any case, the strpbrk function points one character past the substring indicated by this function; giving a pointer to the breaking character. I think, the following equivalences hold:

   strcspn(str, set)        <-->  strpbrk(str, set) - str;
   str + strcspn(str, set)  <-->  strpbrk(str, set);
which further supports strbrk as a good name for strcspn.

Trivia: break and span appear as functions in the Scheme SRFI 1, by Olin Shrivers [1998]. I think these correspond to the take-while and drop-while in Clojure and imitations thereof like the Emacs Lisp dash library.

http://srfi.schemers.org/srfi-1/srfi-1.html#span

1 comments

Sure, within the context of this generation of string-handling functions, the particular example of strpbrk makes sense.

But there's a reason that this sort of name has been left behind in modern APIs.

Yes; that reason being that we don't have the same strict limits on number of characters in an external name, and we simply have too many API's to remember to be able to give six character mnemonics to them.

Short names are not left behind in core languages. For instance, a function that gives the length of a list, string or other sequence is often called length or len, and not length_of_sequence or whatever.

Arc, in which HN is programmed, has reduced "lambda" to "fn". "fn" is the same sort of shortening as using "pbrk" for "pointer to break".

Ruby has shortened "print" to "p".

For the basic core of a language, shortening names is good. When you're reading code, the short names by their very brevity tell you "I'm a thing in the core language, and not some external API to some add-on lib", which also has connotations of "I might be useful in many contexts; it may be worth it to learn about me and remember me".

> For the basic core of a language, shortening names is good.

You do have to admit the slight irony of the shortened string-handling function names in C, given this fact.

You mean, the irony that the names are short, but mountains of code are dedicated to getting string handling in C, and still getting some of it wrong?