| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by robotresearcher 3077 days ago

return x >= ‘A’;

Would be better than

return x >= ASCII_A;

surely. ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.

5 comments

coldtea 3077 days ago

>ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.

I disagree. ASCII_A speaks it's purpose (we purposefully want an ASCII A stored here). And one can check the constant's definition, and immediately tell if it's correct. E.g.

  const ASCII_A = 'A' // correct

  const ASCII_A = 'E' // wrong

So:

  return x >= ASCII_A

tell us the intention of the code's author.

Whereas:

  return x >= ‘A’;

only tells us what the code does, which might nor might not be correct (and we have no way of knowing, without some other documentation).

So, by those two lines:

  const ASCII_A = 'E';
  (...)
  return x >= ASCII_A;

We know what the code is meant to do, AND that it does it wrongly (and thus, we know what to fix).

These line, on the other hand:

  return x >= ‘A’;

tells us nothing. Should it be 'A'? Should it be something else? We don't know.

icebraining 3077 days ago

How do you know that it's the "E" that is wrong, and not the ASCII_A? Maybe it should be ASCII_E.

(If you say it's because it's written twice, well, that's only a valid clue if ASCII_E doesn't happen to be defined too.)

coldtea 3077 days ago

>How do you know that it's the "E" that is wrong, and not the ASCII_A? Maybe it should be ASCII_E.

Ultimately you don't, but ASCII_A requires double the intentional actions to name it and have it also be 'A', whereas 'A' vs 'E' or whatever else is a much easier typo.

It's the whole idea behind NOT having magic values in your code. That is, that:

  if (temp > 212)

tells us much less than:

  if (temp > WATER_BOILING_TEMP)

and that we can more easily spot an error with:

  WATER_BOILING_TEMP = 275

than with:

  if (temp > 275)

icebraining 3077 days ago

Ultimately you don't, but ASCII_A requires double the intentional actions to name it and have it also be 'A', whereas 'A' vs 'E' or whatever else is a much easier typo.

Unless, as I wrote after, you have both ASCII_A and ASCII_E declared, which wouldn't be surprising.

I don't find the "spot the error" argument to be very convincing; I still name stuff, but just for the semantic value.

dragonwriter 3077 days ago

275°F is about right at a little over 3 bar.

Or 275°C at around 60 bar.

phkahler 3077 days ago

return x >= "A"; // ascii A

Gets the whole message across in one line, as does using 65 with the comment.

yesenadam 3077 days ago

Comments aren't so good - now you have 2 things to change when the program changes. In the real world, often the comment won't be updated and will become actively misleading/wrong/bad.

robotresearcher 3077 days ago

(Ignoring the typo "A" != 'A')

return x >= 'A';

already and only means ascii A. Is there a C compiler anywhere where or likely in future where 'A' in C is NOT ascii A? The comment is redundant if correct, and could be wrong after an edit, so it has no value.

coldtea 3077 days ago

>return x >= 'A'; already and only means ascii A.

See, here's where you are wrong.

  ASCII_A = "A"

  alphas = ["Α", "А", "Ꭺ", "ᗅ", "ꓮ", "Ａ", "𐊠", "A", "𝐀", "𝖠", "𝙰", "𝚨", "𝝖"]

  for c in alphas:
      print c == ASCII_A

Output?

  False
  False
  False
  False
  False
  False
  False
  True
  False
  False
  False
  False
  False

Several of the numerous possible utf-8 alphas. Those are not A in different fonts -- they are different unicode characters that look like A. And depending on your font they could look absolutely the same as plain ascii a (of which only one towards the middle of the list is). And depending on your locale and keyboard language settings, one of them could be as easy to click as the regular english A in ASCII.

robotresearcher 3077 days ago

I deliberately used the character literal ‘A’ and not any of your UTF8 strings. I think you are mistaken to confuse a character with your strings. Is this wrong?

coldtea 3077 days ago

You can have a unicode character literal -- and depending on the language there's no distinction between character and string (at the type level), a character is just a string of length 1.

mannykannot 3077 days ago

Be careful with your quotes (depending on which pseudo-language this is.)

coldtea 3077 days ago

Without the convenience of autocomplete and re-use in other places in the code, and with a comment that can always get out of sync with what the code does much easier than a named constant.

phkahler 3077 days ago

My comment was a bit weak. Putting something more of a requirement or design intent in the comment is better. Having it all there can be better than a well described constant with a definition somewhere else. Sure, they could get out of sync but at least you'll be able to see the discrepancy right there on that line if you're looking. But to each their own.

zb 3077 days ago

You must be one of those people who writes stuff like #define TWO 2

vlovich123 3077 days ago

In this strawman example, perhaps. However, code is usually surrounded by other code. So you could have the 'A' in multiple places. By using an explicit identifier you are protecting yourself against typos (depending on the language, it could be a compile-time error or at worst a very clear runtime error instead of a logic error). The other benefit of ASCII_A is that you are signalling that you are doing ASCII comparisons as opposed to using 'A' as a placeholder for a special value of 65 & thus be confusing the reader (e.g. some spec says 65 is some kind of magic value). Finally, by having an ASCII_A it provides you with the opportunity to add documentation explaining why this constant is the way it is (why not 'B'). The benefits scale with the number of instances (e.g. if that specific 'A' appears multiple times in a file, you wouldn't be able to document it in 1 spot).

Of course, all of this is likely overkill for your specific example. If I'm writing a to_hex routine, I'm not going to extract those constants as the context & commonplaceness of the algorithm makes it redundant. For the same reason that one might write i++ in a for loop instead of i += ONE. However, extracting inline constants to named variables is frequently something I look out for in code review, especially the more frequently the same constant appears in multiple places, the more difficulty a reader might have trying to understand why that value is the way it is (or if there's any discussion at all), or if it's a value that will potentially change over time. The negative drawbacks of extracting constants is typically minimal & with modern-day refactoring it's a very small ask of the contributor.

sheepmullet 3077 days ago

> The negative drawbacks of extracting constants is typically minimal

> ASCII_A

It comes down to naming and purpose.

The example, ASCII_A, is terrible because it doesn't describe the purpose with its name.

What will end up happening in any large codebase is ASCII_A will get reused in dozens of different places for dozens of different reasons.

If it was named minValidLetterForAlgorithmX it would convey intent and its more likely to be used correctly.

Terr_ 3077 days ago

I'm partial to ALPHA_START or FIRST_LETTER. While it's true that 'A' is both, the naming helps communicate that the context is range-testing for alphabets inside a larger character set.

flukus 3077 days ago

> In this strawman example, perhaps.

I'm not so sure it's a straw man, I often see defining constants like this cargo culted even if there are only one or two uses. In that case 'A' is great because it's value is right there, I don't have to look at the assignment and then go look up what the actual value is, so it's more readable.

When it's used in several disparate places then ASCII_A is better and your arguments about correctness should take precedence, we sacrifice some readability but it's worth it.

vlovich123 3076 days ago

It's a strawman in the sense that it's completely devoid of context with a contrived example. FWIW, I found 0 instances of something like this on GitHub (https://github.com/search?q=%22ASCII_A%22&type=Code&utf8=%E2...). I concur that cargo culting it to the extreme can lead to absurdness, but that's true of all maintenanability rules of thumb. Any rule of thumbs can be over-applied. However, in my experience the inverse is generally more true.

robotresearcher 3077 days ago

Sure, I understand. The surrounding code would include the type of x, which, if char, would help understanding even more.

But you’re channeling some crazy madness suggesting that someone would use ‘A’ to mean 65. Shudder. I guess we’ve all seen some horrors over the years.

coldtea 3077 days ago

>But you’re channeling some crazy madness suggesting that someone would use ‘A’ to mean 65.

Or just an encoding scheme.

robotresearcher 3077 days ago

Where 65 means ‘A’? Madness.

vlovich123 3076 days ago

You could have a binary file format with a header of ABBA. You could choose to check the signature by doing an integer comparison of 65666665, 0x41424241 or "ABBA". Like I said, ASCII_A is a bit silly, but the maintenance value of extracting constant literals to constant variables with an explicit name & documentation explaining where the constant comes from is pretty solid, at least in my experience.

vageli 3077 days ago

A is 65 in ASCII. http://www.asciitable.com

robotresearcher 3077 days ago

I know. It was a joke.

buckminster 3077 days ago

He says that in the article:

> ASCII_A (usually spelled just 'A')

Of course, they are not the same thing. In the last 6 months I've worked on a very old system that uses not-quite-ASCII. 'A' was 65 but '#' wasn't 35.

stevenwoo 3077 days ago

There's the theory that any hardcoded constant directly in code is bad idea. It may be used more than once, or used only once now, but in the future used more than once, or in the future the value may be changed and if it's used more than once, this is a source of issues.

yongjik 3077 days ago

I get that using hard-coded constant is a bad idea, but using ASCII_A instead of 'A' is about as sensible as using SIXTY_FOUR instead of 64.

If A signifies something else, use that name; otherwise just use plain 'A': it already gives us as much information as needed, and has one less place where the programmer can screw up.

robotresearcher 3077 days ago

I get that in general. It depends if the code is meant to inspect the character x on this machine right now, or really the ASCII character x.

As an aside, if someone changes the constant value of ‘A’ now, the world will be broken for a while. (But my code would recompile correctly unchanged with the new standard header.)

pkamb 3077 days ago

https://stackoverflow.com/questions/3202629/where-can-i-find...