| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dgrunwald 352 days ago
	But the source character set remains implementation-defined, so compilers do not have to directly support unicode names, only the escape notation. Definitely a questionable choice to throw off readers with unicode weirdness in the very first code example.

1 comments

qsort 352 days ago

If it were up to me, anything outside the basic character set in a source file would be a syntax error, I'm simply reporting what the spec says.

link

ncruces 352 days ago

I use unicode for math in comments, and think makes certain complicated formulas far more readable.

link

kzrdude 352 days ago

I've just been learning pinyin notation, so now i think the variable řₚ should have a value that first goes down a bit and then up.

link

zelphirkalt 352 days ago

I am not sure it is a good idea to mix such specific phonetic script ideas about diacritic marks with the behavior of the program over time. Even considering the shape, it does not align with the idea of first down a little, then up a lot.

link

kzrdude 351 days ago

To be sure, it's a joke. Mostly trying to joke at the expense of these excessively complicated variable names (that are only there because it's pseudocode) :)

And yeah, the chinese tone in practice does not align with the idea of "down a little up a lot" either. It depends on context...

link

guipsp 352 days ago

What a "basic character set" is depends on locale

link

qsort 352 days ago

https://en.cppreference.com/w/c/language/charset.html

link

account42 352 days ago

Anything except US-ASCII in source code outside comments and string constants should be a syntax error.

link

guipsp 352 days ago

You are aware other languages exist? Some of which don't even use the Latin script?

link

nottorp 352 days ago

Dunno about the OP but I'm very aware as I'm not an english speaker.

I still don't want anything as unpredictable as Unicode in my code. How many different encodings will display as the same variable name and how is the compiler supposed to decide?

If you're thinking of comments and user facing strings, the OP already excluded those.

link

cryptonector 351 days ago

The language and compiler & linker should reject Zalgo in identifiers, and they should reject confusable script mixes in identifiers, but otherwise they treat all equivalent strings as equivalent. To make it easier on the linker compilers should normalize all symbols to one common form (e.g., NFC).

link

account42 351 days ago

And those are not programming languages, or at least not the C programming language which only needs a very limited character set.

link

steveklabnik 351 days ago

C does allow for limited unicode in identifiers, though you need to use the \u prefix and write the code out. Compilers like clang let it work like C++ and follow TR31, though this is nonstandard.

link

Y_Y 352 days ago

What; like APL‽

link