| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by smcameron 357 days ago
	Ugh. Are unicode variable names allowed in C now? That's horrific.

5 comments

OkayPhysicist 357 days ago

Why shouldn't they be? It's not the 00's anymore, Unicode support is universal. You'd have to dust off some truly ancient tech to find something incapable of rendering it.

Source code is for humans, and thus should be written in whatever way makes it easiest to read, write, and understand for humans. If your language doesn't map onto ASCII, then Unicode support improves that goal. If your code is meant to directly implement some physics formula, then using the appropriate unicode characters might make it easier to read (and thus spot transcription errors, something I find far too often in physics simulations).

bigstrat2003 357 days ago

They shouldn't be precisely because it makes the code harder to read and write when you include non-ASCII characters.

wheybags 357 days ago

Hot take, but I've always felt the world would be better served if mathematicians and physicists would stop using terrible short variable names and use longCamelCaseDescriptiveNames like the rest of us, because paper is cheap, and abbreviations are confusing. I know it's nicer when you're writing by hand, but when you clean up a proof or formula for publishing, would it really be so hard to switch to descriptive names?

I'm a practitioner of neither though, so I can't condemn the practice wholeheartedly as an outsider, but it does make me groan.

nsingh2 357 days ago

Better served to students and those unfamiliar with the field, but noisy to those familiar. Considering that much of mathematical work is done using pen/paper, it would be a total pain to write out huge variable names every time.

Consider a simple programming example, in C blocks are delimited by `{}`, why not use `block_begin` and `block_end`? Because it's noisy, and it doesn't take much to internalize the meaning of braces.

senbrow 357 days ago

Long names are good for short expressions, but they obfuscate complex ones because the identifiers visually crowd out the operators.

This can be especially difficult if the author is trying to map 1:1 to a complex algorithm in a white paper that uses domain-standard mathematical notation.

The alternative is to break the "full formula" into simpler expression chunks, but then naming those partial expression results descriptively can be even more challenging.

someplaceguy 357 days ago

> using the appropriate unicode characters might make it easier to read

It's probably also a great way to introduce almost undetectable security vulnerabilities by using Unicode characters that look similar to each other but in fact are different.

OkayPhysicist 357 days ago

This would cause your compilation to fail, unless you were deliberately declaring and using near identical symbols. Which would violate the whole "Code is meant to be easily read by humans" thing.

someplaceguy 357 days ago

> unless you were deliberately declaring and using near identical symbols.

Yes, that would probably be one way to do it.

> Which would violate the whole "Code is meant to be easily read by humans" thing.

I'd think someone who's deliberately and sneakily introducing a security vulnerability would want it to be undetectable, rather than easily readable.

1over137 357 days ago

Horrific? You might not think so if your (human) language used a different alphabet.

Joker_vD 357 days ago

My language uses Cyrillic and I personally prefer English-based keywords and variable names precisely because they are not words of my (human) language. It introduces an easy and obvious distinction between the machine-oriented and the human-oriented.

cryptonector 356 days ago

Yes, I also think the whole word should program in English.

That's half tongue in cheek. I am fluent in three languages, but I program "in English" and I greatly appreciate that my colleagues who are fluent in languages other than the ones I'm fluent in (except English) also do. Basically English is the world's lingua franca today. Nonetheless if a company in France wants to use French for their symbol names, or a company in Mexico wants to use Spanish for their symbol names, or a company in China wants to use Chinese for their symbol names, who am I to stop them?! Surely it's not my place.

ZoomZoomZoom 357 days ago

I know what you mean and I shudder when I see code that uses words from my native lang, but most code is human-oriented.

ajross 357 days ago

Little to no source code is written for single (human) language development teams. Sure, everyone would like the ability to write source code in their native language. That's natural.

Literally no one, anywhere, wants to be forced to read source written in a language they can't read (or more specifically in this case: written in glyphs they can't even produce on their keyboard). That idea, for almost everyone, seems "horrific", yeah.

So a lingua franca is a firm requirement for modern software development outside of extremely specific environments (FSB malware authors probably don't care about anyone else reading their cyrillic variable names, etc...). Must it be ASCII-encoded English? No. But that's what the market has picked and most people seem happy enough with it.

OkayPhysicist 357 days ago

> Little to no source code is written for single (human) language development teams.

This is blatantly false. I'd posit that a solid 90% of all source code written is done so by single, co-located teams (a substantial portion of which are teams of 1). That certainly fits the bill for most companies I've worked at.

eqvinox 357 days ago

Yes but also no. The thing about software is that 90% of it is not culturally bound. If you're writing, say, some tax reporting tool, a grammar reference, or something religious… sure, it makes sense to write that in your language. So, yeah, C should support that.

However, everything else, from spreadsheet software to CAD tools to OS kernels to JavaScript frameworks is universal across cultures and languages. And for better or for worse (I'm not a native English speaker either), the world has gone with English for a lot of code commons.

And the thing with the examples in that post isn't about supporting language diversity, it's math symbols which are noone's native language. And you pretty much can't type them on any keyboard. Which really makes it a rather poor flex IMHO. Did the author reconfigure their keyboard layout for that specific math use case? It can't generically cover "all of math" either. Or did they copy&paste it around? That's just silly.

[…could some of the downvoters explain why they're downvoting?]

OkayPhysicist 357 days ago

When I was doing a lot of Physics simulation in Julia, I had a Vim extension which would just allow me to type something like \gamma, hit tab, and get γ. This was worth the (minimal) hassle, because it made it very easy to spot check formulas. When you're shuffling data around in a loosely-described space like most of web dev, descriptive function and variable names are important because the description of what you're doing and what you're doing it too is the important information, and the actual operations you're taking are typically approximately trivial.

In heavily mathematical contexts, most of those assumptions get turned on their head. Anybody qualified to be modifying a model of electromagnetism is going to be intimately familiar with the language of the formulas: mu for permeability, epsilon for permittivity, etc. With that shared context,

1/(4*π*ε)*(q_electron * q_proton)/r^2 is going to be a lot easier to see, at a glance, as Coulombs law

compared to

1 / (4 * Math.Pi * permitivity_of_free_space)*(charge_electron * charge_proton)/distance_of_separation

Source code, like any other language built for humans, is meant to be read by humans. If those humans have a shared context, utilizing that shared context improves the quality and ease of that communication.

eqvinox 357 days ago

Hrm. Fair point. But will the other humans, even if they have the shared context, also have the ability to type in these symbols, if they want to edit the code? They probably don't have your vim extension…

I guess maybe this is an argument for better UI/UX for symbolic input…

cryptonector 356 days ago

> […could some of the downvoters explain why they're downvoting?]

Because you made false assertions ("And you pretty much can't type them on any keyboard").

eqvinox 356 days ago

Please show me the keyboard layout that has keys for ⁺, ř and ₚ.

(Unless you're being pedantic because I wrote "keyboard" rather than "keyboard layout", or ignored the qualifying "pretty much". In either of those cases you're unwilling to communicate cooperatively and I can't help you.)

cryptonector 356 days ago

Search for compose key sequences.

eqvinox 356 days ago

> Search for compose key sequences.

I don't need to do that because I actively use them myself and have a custom ~/.XCompose. Also, please try communicating less condescendingly.

There is no default compose sequence for ₚ that I can find, at least in my Debian installation.

So, again, please point me at the layout that can output these characters.

And even with that: if you don't think Compose sequences, possibly even custom, are covered by "pretty much impossible", I must seriously question your perception & bias of how common (or not) things are.

mananaysiempre 357 days ago

“Now” as in since C99, twenty-five years ago, yes. (It seemed like a good idea at the time.)

kevincox 357 days ago

Being able to program in languages that don't fit into ASCII is a good idea. Using one-character variable names is a bad idea.

RossBencina 357 days ago

Mathematics is a language that doesn't fit into ASCII and commonly uses one-character variable names. If you are implementing a documented mathematical algorithm (i.e. one with a description in a paper or book) then sticking to the notation of the paper (i.e. using one character variable names) makes sense to me.

kevincox 357 days ago

I find math far easier to read when the authors use proper names for variables. But I understand that it isn't the idiomatic style and agree that it can be useful to match the paper when re-implementing an algorithm.

mananaysiempre 357 days ago

Unfortunately, many of the things of this nature that you’ll want to implement use indices, which are inevitably going to start at 1. So you’ll still got plenty of hours of unpleasant debugging ahead of you, and a non-obvious correspondence to the original paper at the end of it.

adrianN 357 days ago

Using variable names that are different but render (almost) the same can be a bad idea.

90s_dev 357 days ago

See also https://www.ethiocloud.com/bunnascript.aspx and https://en.wikipedia.org/wiki/Non-English-based_programming_...

SV_BubbleTime 357 days ago

> void recip(double* aₚ, double* řₚ) > { > for (;;) > { > register double Π = (aₚ)(řₚ);

My first thought before I saw this was “I wonder is this going to be an article from people who build things or something from “academics” that don’t.”

At least it was answered quickly.

loeg 357 days ago

Math people shouldn't be allowed to write code. It's not the unicode, so much as the extremely terse variable names.

perching_aix 357 days ago

Isn't that basically all C/C++ code? Admittedly I don't have much exposure to it, but it's pretty much a trope in and of itself, along with Java and C# suffering from the opposite problem.

Such a silly issue too, you'd think we'd have come up with some automated wrangling for this, so that those experienced with a codebase can switch over and see super short versions of identifiers, while people new to it all will see the long stuff.

flohofwoe 357 days ago

> Isn't that basically all C/C++ code?

Maybe for code that was written in the early 90's, but the only 'tradition' that has survived is calling the vanilla loop variable 'i'.