Hacker News new | ask | show | jobs
by robotresearcher 3077 days ago
return x >= ‘A’;

Would be better than

return x >= ASCII_A;

surely. ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.

5 comments

>ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.

I disagree. ASCII_A speaks it's purpose (we purposefully want an ASCII A stored here). And one can check the constant's definition, and immediately tell if it's correct. E.g.

  const ASCII_A = 'A' // correct

  const ASCII_A = 'E' // wrong
So:

  return x >= ASCII_A
tell us the intention of the code's author.

Whereas:

  return x >= ‘A’;
only tells us what the code does, which might nor might not be correct (and we have no way of knowing, without some other documentation).

So, by those two lines:

  const ASCII_A = 'E';
  (...)
  return x >= ASCII_A;
We know what the code is meant to do, AND that it does it wrongly (and thus, we know what to fix).

These line, on the other hand:

  return x >= ‘A’;
tells us nothing. Should it be 'A'? Should it be something else? We don't know.
How do you know that it's the "E" that is wrong, and not the ASCII_A? Maybe it should be ASCII_E.

(If you say it's because it's written twice, well, that's only a valid clue if ASCII_E doesn't happen to be defined too.)

>How do you know that it's the "E" that is wrong, and not the ASCII_A? Maybe it should be ASCII_E.

Ultimately you don't, but ASCII_A requires double the intentional actions to name it and have it also be 'A', whereas 'A' vs 'E' or whatever else is a much easier typo.

It's the whole idea behind NOT having magic values in your code. That is, that:

  if (temp > 212)
tells us much less than:

  if (temp > WATER_BOILING_TEMP)
and that we can more easily spot an error with:

  WATER_BOILING_TEMP = 275
than with:

  if (temp > 275)
Ultimately you don't, but ASCII_A requires double the intentional actions to name it and have it also be 'A', whereas 'A' vs 'E' or whatever else is a much easier typo.

Unless, as I wrote after, you have both ASCII_A and ASCII_E declared, which wouldn't be surprising.

I don't find the "spot the error" argument to be very convincing; I still name stuff, but just for the semantic value.

275°F is about right at a little over 3 bar.

Or 275°C at around 60 bar.

return x >= "A"; // ascii A

Gets the whole message across in one line, as does using 65 with the comment.

Comments aren't so good - now you have 2 things to change when the program changes. In the real world, often the comment won't be updated and will become actively misleading/wrong/bad.
(Ignoring the typo "A" != 'A')

return x >= 'A';

already and only means ascii A. Is there a C compiler anywhere where or likely in future where 'A' in C is NOT ascii A? The comment is redundant if correct, and could be wrong after an edit, so it has no value.

>return x >= 'A'; already and only means ascii A.

See, here's where you are wrong.

  ASCII_A = "A"

  alphas = ["Α", "А", "Ꭺ", "ᗅ", "ꓮ", "A", "𐊠", "A", "𝐀", "𝖠", "𝙰", "𝚨", "𝝖"]

  for c in alphas:
      print c == ASCII_A
Output?

  False
  False
  False
  False
  False
  False
  False
  True
  False
  False
  False
  False
  False
Several of the numerous possible utf-8 alphas. Those are not A in different fonts -- they are different unicode characters that look like A. And depending on your font they could look absolutely the same as plain ascii a (of which only one towards the middle of the list is). And depending on your locale and keyboard language settings, one of them could be as easy to click as the regular english A in ASCII.
I deliberately used the character literal ‘A’ and not any of your UTF8 strings. I think you are mistaken to confuse a character with your strings. Is this wrong?
You can have a unicode character literal -- and depending on the language there's no distinction between character and string (at the type level), a character is just a string of length 1.
Be careful with your quotes (depending on which pseudo-language this is.)
Without the convenience of autocomplete and re-use in other places in the code, and with a comment that can always get out of sync with what the code does much easier than a named constant.
My comment was a bit weak. Putting something more of a requirement or design intent in the comment is better. Having it all there can be better than a well described constant with a definition somewhere else. Sure, they could get out of sync but at least you'll be able to see the discrepancy right there on that line if you're looking. But to each their own.
You must be one of those people who writes stuff like #define TWO 2
In this strawman example, perhaps. However, code is usually surrounded by other code. So you could have the 'A' in multiple places. By using an explicit identifier you are protecting yourself against typos (depending on the language, it could be a compile-time error or at worst a very clear runtime error instead of a logic error). The other benefit of ASCII_A is that you are signalling that you are doing ASCII comparisons as opposed to using 'A' as a placeholder for a special value of 65 & thus be confusing the reader (e.g. some spec says 65 is some kind of magic value). Finally, by having an ASCII_A it provides you with the opportunity to add documentation explaining why this constant is the way it is (why not 'B'). The benefits scale with the number of instances (e.g. if that specific 'A' appears multiple times in a file, you wouldn't be able to document it in 1 spot).

Of course, all of this is likely overkill for your specific example. If I'm writing a to_hex routine, I'm not going to extract those constants as the context & commonplaceness of the algorithm makes it redundant. For the same reason that one might write i++ in a for loop instead of i += ONE. However, extracting inline constants to named variables is frequently something I look out for in code review, especially the more frequently the same constant appears in multiple places, the more difficulty a reader might have trying to understand why that value is the way it is (or if there's any discussion at all), or if it's a value that will potentially change over time. The negative drawbacks of extracting constants is typically minimal & with modern-day refactoring it's a very small ask of the contributor.

> The negative drawbacks of extracting constants is typically minimal

> ASCII_A

It comes down to naming and purpose.

The example, ASCII_A, is terrible because it doesn't describe the purpose with its name.

What will end up happening in any large codebase is ASCII_A will get reused in dozens of different places for dozens of different reasons.

If it was named minValidLetterForAlgorithmX it would convey intent and its more likely to be used correctly.

I'm partial to ALPHA_START or FIRST_LETTER. While it's true that 'A' is both, the naming helps communicate that the context is range-testing for alphabets inside a larger character set.
> In this strawman example, perhaps.

I'm not so sure it's a straw man, I often see defining constants like this cargo culted even if there are only one or two uses. In that case 'A' is great because it's value is right there, I don't have to look at the assignment and then go look up what the actual value is, so it's more readable.

When it's used in several disparate places then ASCII_A is better and your arguments about correctness should take precedence, we sacrifice some readability but it's worth it.

It's a strawman in the sense that it's completely devoid of context with a contrived example. FWIW, I found 0 instances of something like this on GitHub (https://github.com/search?q=%22ASCII_A%22&type=Code&utf8=%E2...). I concur that cargo culting it to the extreme can lead to absurdness, but that's true of all maintenanability rules of thumb. Any rule of thumbs can be over-applied. However, in my experience the inverse is generally more true.
Sure, I understand. The surrounding code would include the type of x, which, if char, would help understanding even more.

But you’re channeling some crazy madness suggesting that someone would use ‘A’ to mean 65. Shudder. I guess we’ve all seen some horrors over the years.

>But you’re channeling some crazy madness suggesting that someone would use ‘A’ to mean 65.

Or just an encoding scheme.

Where 65 means ‘A’? Madness.
You could have a binary file format with a header of ABBA. You could choose to check the signature by doing an integer comparison of 65666665, 0x41424241 or "ABBA". Like I said, ASCII_A is a bit silly, but the maintenance value of extracting constant literals to constant variables with an explicit name & documentation explaining where the constant comes from is pretty solid, at least in my experience.
A is 65 in ASCII. http://www.asciitable.com
I know. It was a joke.
He says that in the article:

> ASCII_A (usually spelled just 'A')

Of course, they are not the same thing. In the last 6 months I've worked on a very old system that uses not-quite-ASCII. 'A' was 65 but '#' wasn't 35.

There's the theory that any hardcoded constant directly in code is bad idea. It may be used more than once, or used only once now, but in the future used more than once, or in the future the value may be changed and if it's used more than once, this is a source of issues.
I get that using hard-coded constant is a bad idea, but using ASCII_A instead of 'A' is about as sensible as using SIXTY_FOUR instead of 64.

If A signifies something else, use that name; otherwise just use plain 'A': it already gives us as much information as needed, and has one less place where the programmer can screw up.

I get that in general. It depends if the code is meant to inspect the character x on this machine right now, or really the ASCII character x.

As an aside, if someone changes the constant value of ‘A’ now, the world will be broken for a while. (But my code would recompile correctly unchanged with the new standard header.)