Hacker News new | ask | show | jobs
by cookiecaper 3752 days ago
Great post. This is one of my biggest pet peeves with Ruby and languages like it; they encourage developers to "show off" by using the most esoteric features they can find (even better if these features use weird symbols).

Compared to a language like Python that has few neuroses, the same developer writes much less readable code.

The post may have been better if the author included a Python sample that does the same thing:

    long_string = 'ajix mxozl xoap'
    new_string = []
    for word in long_string.split(' '):
        new_string.append("{}{}".format(word[0].upper(), word[1:]))
    new_string = ''.join(new_string)
Admittedly there are a few annoying symbols in here, mostly due to the non-intuitive operation of the join function and the well-meaning but less-than-ideal string formatting syntax (which could've been avoided if we used conventional strong concatenation, and which PEP 0498 attempts to improve). Also admittedly, coders who want to show how smart they are would try to use a list comprehension to do this, which is slightly less readable and imo shouldn't be used over this format without a good reason. But it's still much easier to parse than the Ruby version because everything is explicitly spelled out in the loop, and you generally shouldn't have to consult the language docs to look up 4 different rarely-seen operators.

Remember, debugging is twice as hard as authoring, so if you write the most clever code possible, you are by definition not intelligent enough to debug it. ;)

I understand that people can't be expected to stick to that on their own, so they need languages that promote function, uniformity, and ease of use over showmanship.

The simplicity of the code is one of the main things I look for in interviews. If you could've done something with the conventional, simple language construct that doesn't require someone to go back and refer to the docs, even if it takes more lines, but instead you used the super-arcane construct to prove how well you know the language or to "get it all on one line", I'm going to look on that pretty dubiously. The last thing I want to deal with on my projects is the residue of someone's ego impacting our ability to read their code and get things done quickly and easily.

P.S., in Python, there are a couple of shortcut functions for capitalizing each word in a sentence: str.title() and str.capwords().

2 comments

Something like the following:

   capitalize = lambda word: "{}{}".format(word[0].upper(), word[1:])
   new_string = ' '.join(map(capitalize, long_string.split(' ')))
or:

    new_string = ' '.join(["{}{}".format(word[0].upper(), word[1:]) for word in long_string.split(' ')])
is much clearer and simpler to me, because it explains the intention of the code: split the string into words, map each word to it's capitalized version, then join it again. With the loop, I have to mentally "run" the code to realize that the for loop and append are to go through the split version and add the converted words to a new list.

Obviously, this is a pretty subjective topic, and which version you prefer will depend what you're familiar with/the languages you use.

For me, readability is about how intuitive something is, how much can be understood with basic knowledge of programming in general and perhaps a quick primer on the specific language. I think this is the best measurement because it emphasizes a reliance on the most commonly used, general concepts, and encourages people to use those unless there's a good reason not to. A side benefit is that the more basic language primitives and flow controls tend to be more performant, better tested, and have fewer weird edge cases where they behave in an unexpected manner.

I dislike your first example because

a) imo, the lack of spacing makes it harder to see what's going on. it's more obvious that something is being iterated in a for loop with a new indentation level than in the map function.

b) it depends on the Python-specific implementation of lambda. The behavior of a lambda varies substantially from language to language and lambdas are used rarely enough that it's pretty likely someone who doesn't spend all day every day in Python is going to have to go back and look up the specific behavior. The syntax is much less clear than a full function definition.

In my opinion, it's much easier to read code like:

    def capitalize(str):
        return "{}{}".format(str[0].upper(), str[1:])

    for word in long_string.split(' '):
        captizalize(word)
    ...
This makes it much more obvious what's going on, and it should be readable to anyone with a passing knowledge of Python, and possibly anyone with a knowledge of programming languages in general. Invocation of map and lambda in this case only make the intent of the program more obscure.

I'm not saying that map or lambda are never appropriate to use; sometimes they are. But I don't think it's wise to use them when more basic, universal language constructs do an equally adequate job, especially if the only benefit is "fewer lines of code".

The second example is just a list comprehension form of my original example, which IMO is less readable for much the same reasons. If you're not super familiar, you'll need to go back and look up list comprehensions. There is no spacing to make it obvious that something particular is being iterated or branched.

I understand that ultimately, ease of reading comes down to what style one is most familiar with, which makes it subjective, as you said. But I think there is a stronger rational basis for always preferring the simplest construct that adequately performs the function, which is that in the general case, there is less need to refer back to docs, less possibility of unexpected behavior, and less possibility of strange performance issues.

Not having touched Python for many years, four things jump out at me:

1) .format, which isn't immediately familiar and doesn't resemble similar string formatters in other languages 2) [1:], which I believe is a string slicing syntax that doesn't resemble similar syntax in other languages 3) Bug: nothing is done with the capitalized words, since the return value is thrown away 4) Bug: the name of the function was misspelled when used.

The last one may seem like a nitpick, but it is true that when you add a name to your code for the sake of clarity, you also take on the additional burden of ensuring the name is used consistently and accurately everywhere. This can be a particular pain in cases where you are generating a lot of uninteresting temporary values -- which is precisely why people end up writing chained function or method calls.

Point 1 is conceded in my earlier response. You're right that someone who is unfamiliar with Python is going to be stopped by the format syntax. However, it's simple to acclimate and learn the basics of it and is something that is very pervasive in Python code and is one of the safest ways to interpolate string data, so its use is justified.

Point 2, this type of string slicing syntax is pretty common in languages similar to Python. A very similar syntax exists in Ruby and Perl. See https://en.wikipedia.org/wiki/Array_slicing for more examples of slice shorthand in the wild.

Point 3. I didn't intend to rewrite the whole thing again, just enough to demonstrate my problems with the reply's lambda-based approach. This is indicated by the ellipsis. If I were to create a full version instead of the quick demonstration of the more-clear full function definition here, then yes, I would've assigned the output of the function to something.

4. Conceded.

OK, but that doesn't seem much different from "code written in Python should be written in a style familiar to Python developers", which is a good principle regardless of the language. In Ruby, the use of Enumerable methods is generally preferred; it would be as "surprising" to use a for loop in this case as it would be to use map in Python.

I also don't see a good argument that string slicing, `for ... in` and Python-style string formatters are more "universal" than lambdas and map/filter/reduce. All of them exist in large subsets of commonly used programming languages; all of them are used heavily in some languages and rarely in others; only one of them (in the form of sprintf) exists in C.

> OK, but that doesn't seem much different from "code written in Python should be written in a style familiar to Python developers", which is a good principle regardless of the language. In Ruby, the use of Enumerable methods is generally preferred; it would be as "surprising" to use a for loop in this case as it would be to use map in Python.

Yes, ultimately, it's a judgment call about the degree to which language-specific caveats and conventions are considered substantially beneficial to justify their introduction.

I will, however, state that I hate that Ruby uses .each instead of for. I do use .each when I write Ruby because as you stated, most Ruby devs will look at you sideways if you used a real for loop, but I really dislike that it's become that way. That's an example of something that's different just for differences sake; any benefit derived from it is marginal and corner-case (something like wanting to override the standard Enumerable behavior), and it makes the whole thing more insular (meaning the behavior is difficult to generalize or extrapolate beyond Ruby, you have to look up the specific behavior), less friendly (meaning it draws attention to itself and takes away productive time; when your code style does this, it needs good justification), and harder to read (because of the two preceding points). IMO, that's a tradeoff that wasn't worthwhile.

>I also don't see a good argument that string slicing, `for ... in` and Python-style string formatters are more "universal" than lambdas and map/filter/reduce. All of them exist in large subsets of commonly used programming languages; all of them are used heavily in some languages and rarely in others; only one of them (in the form of sprintf) exists in C.

Again, it's about being as minimally disruptive to the typical programmer that would be reading the code as possible. While map/filter/reduce may exist in some form in most languages, they're not very commonly used by programmers with an imperative background. Languages like Python have formatting rules that make them harder to read than their language-construct counterparts like for, and while most other imperative languages won't enforce specific formatting rules and preclude the programmer from formatting his code such that map's iteration is equally visually obvious as a for loop's typical indentation and/or bracing, it'd be difficult and unusual to maintain them.

String slicing is a very common need and most languages provide a simple way to perform it, whether it's the slicing shorthand or substr() calls. You're right that C doesn't provide tools for this, but C doesn't even recognize the string as a thing; you just work with groups of chars. The programming community has clearly repudiated that philosophy and demonstrated that it expects its languages to do most of the string dirty work directly. Same goes for string formatters, even though, as I've stated 3 times now, Python's is unfortunately a bit anomalous for not much benefit. This is improved with PEP 498.

"for in" is a nearly self-evident and easy to remember. While this may or may not differ slightly from the specific syntax used in other Python-style languages, it's fairly obvious to someone who is familiar with that language class, and easy enough for someone who has only been exposed to C to hook up and remember once they've read about it once. There also isn't really a more or equally obvious way to express this in Python.

I accept that there are some classes of languages where this is not the case, primarily functional languages. In those cases, the most simple, universally-applicable approach for that language family should be used.

In Python, I'd use use a generator comprehension and str.title(). Assuming I haven't screwed up, then it's just:

    new_string = "".join(word.title() for word in long_string.split(" "))
Agreed. My first thought was to use `new_string = long_string.title().replace(' ', '')` or the generator you mentioned.