Hacker News new | ask | show | jobs
by jasode 4289 days ago
It just makes it easier to read the magnitude of a number and get that extra visual reassurance that it's correct. It's not easy to (quickly) tell the difference between 1 billion and 100 million unless you count zeros (1000000000 vs 100000000). On the other hand it's very easy to discern 1'000'000'000 vs 100'000'000.

The standard can't use commas as a separator because using commas vs periods to demarcate components is locale specific. Apostrophes look like a reasonable compromise because lots of calculators already use apostrophes to separate digits:

https://www.google.com/search?q=14+digit+calculator&source=l...

2 comments

I would've gone with spaces. Those are the official standard here in Canada as a pragmatic compromise between anglos and francos that have opposing usage of comma/dot.

So instead of 1,234.56789 (English) or 1.234,56789 (French) we do 1 234.56789 (international)

[0-9]+ [0-9]+ has no meaning in C++, so using space for magnitude decorations would have worked.

Nice, it fits well with the "adjacent strings are automatically concatenated by the compiler" rule, too.
Have you considered if this would fit with C++'s grammar?
Briefly, but I think there are like three people in the world that fully grok C++ grammar, and I'm definitely not one of them. I just know that C++ generally has some kind of infix operator or delimiter between pairs of immediate data or variables, I'm pretty sure it's currently just a compiler error in every case.
Except this is not C++ grammar, this is simple lexical analysis. And separating by spaces would work fine.
What about semantics? Probably 0.
The bigger reason why you can't use commas is because commas are already an operator in c++. 1,2 is an expression that evaluates to 2.

Dunno why they didn't use underscores though.

Underscores are already being used for user defined literals.
User-defined literals can't generally start with a digit, so it should be perfectly fine to allow 1_000_000 syntax for 1000000. Or I'm forgetting something?
Hexadecimal numbers, probably. 0x001_f05 would be ambiguous.
I don't think that's it. 0x001_f05 starts with 0, which is still not something an identifier can do.
0x001_f05 could be a UDL named f05 which should be passed 0x001.
Where do you get this information from, because it's entirely incorrect.
The "already an operator" can be a factor in the design reasoning but if they really really wanted to, they could have made the comma syntax for digit separators by modifying the parsing grammar: e.g. a special prefix and/or contextual parsing.

For example, the "->" was already an operator for pointer indirection but it was reused for lambda definitions. The "[]" was already used for array subscript indexes but was reused for lambda captures.

You can't do that for ",", because it will break the existing code. For example, int x = 0; x = 1,2; is perfectly valid pre c++11 code. This will assign 1 to x. Now if you make "," as digit separator, it will assign 12 to x
I mentioned the possibility of a special prefix in my previous post. In other words, that parsing ambiguity goes away if the standards committee wanted to define a special prefix such as 'd', 'k', or '_' to inform the parser that the next comma is a "digit separator" instead of "comma operator" such as _1,000,000 . E.g. there's already intelligence about commas in the parser to disambiguate function calls with commas as in "repositionxyz(914,348,122)"

Or, they could have defined C++14 to simply invalidate your example syntax of "x=1,2". As an example, the "auto" keyword was made a "breaking change" such that "for (auto int i = 0;;)" no longer compiles.

The bottom line is that there are a myriad of ways to address (potential) parsing ambiguities when introducing new language features and syntax. (Whether or not a comma digit separator is worth the clumsier syntax of a special prefix or inflicting the pain of a breaking change is a separate concept.)

C++ doesn't have the friendliest grammar for addressing potential parsing ambiguities.

As an example, your proposed solutions

   _1,000,000
   k1,000,000
Both are already valid if variables or functions _1 or k1 are in scope, and evaluate to (octal) zero.

And I haven't checked, but it would guess that invalidating "x=1,2" in the parser would not be "simple". I cannot think of a reason, but it also might turn out to be more common then one would think due to macro expansions.