Why shouldn't they be? It's not the 00's anymore, Unicode support is universal. You'd have to dust off some truly ancient tech to find something incapable of rendering it.
Source code is for humans, and thus should be written in whatever way makes it easiest to read, write, and understand for humans. If your language doesn't map onto ASCII, then Unicode support improves that goal. If your code is meant to directly implement some physics formula, then using the appropriate unicode characters might make it easier to read (and thus spot transcription errors, something I find far too often in physics simulations).
Hot take, but I've always felt the world would be better served if mathematicians and physicists would stop using terrible short variable names and use longCamelCaseDescriptiveNames like the rest of us, because paper is cheap, and abbreviations are confusing.
I know it's nicer when you're writing by hand, but when you clean up a proof or formula for publishing, would it really be so hard to switch to descriptive names?
I'm a practitioner of neither though, so I can't condemn the practice wholeheartedly as an outsider, but it does make me groan.
Better served to students and those unfamiliar with the field, but noisy to those familiar. Considering that much of mathematical work is done using pen/paper, it would be a total pain to write out huge variable names every time.
Consider a simple programming example, in C blocks are delimited by `{}`, why not use `block_begin` and `block_end`? Because it's noisy, and it doesn't take much to internalize the meaning of braces.
Long names are good for short expressions, but they obfuscate complex ones because the identifiers visually crowd out the operators.
This can be especially difficult if the author is trying to map 1:1 to a complex algorithm in a white paper that uses domain-standard mathematical notation.
The alternative is to break the "full formula" into simpler expression chunks, but then naming those partial expression results descriptively can be even more challenging.
> using the appropriate unicode characters might make it easier to read
It's probably also a great way to introduce almost undetectable security vulnerabilities by using Unicode characters that look similar to each other but in fact are different.
This would cause your compilation to fail, unless you were deliberately declaring and using near identical symbols. Which would violate the whole "Code is meant to be easily read by humans" thing.
My language uses Cyrillic and I personally prefer English-based keywords and variable names precisely because they are not words of my (human) language. It introduces an easy and obvious distinction between the machine-oriented and the human-oriented.
Yes, I also think the whole word should program in English.
That's half tongue in cheek. I am fluent in three languages, but I program "in English" and I greatly appreciate that my colleagues who are fluent in languages other than the ones I'm fluent in (except English) also do. Basically English is the world's lingua franca today. Nonetheless if a company in France wants to use French for their symbol names, or a company in Mexico wants to use Spanish for their symbol names, or a company in China wants to use Chinese for their symbol names, who am I to stop them?! Surely it's not my place.
Little to no source code is written for single (human) language development teams. Sure, everyone would like the ability to write source code in their native language. That's natural.
Literally no one, anywhere, wants to be forced to read source written in a language they can't read (or more specifically in this case: written in glyphs they can't even produce on their keyboard). That idea, for almost everyone, seems "horrific", yeah.
So a lingua franca is a firm requirement for modern software development outside of extremely specific environments (FSB malware authors probably don't care about anyone else reading their cyrillic variable names, etc...). Must it be ASCII-encoded English? No. But that's what the market has picked and most people seem happy enough with it.
> Little to no source code is written for single (human) language development teams.
This is blatantly false. I'd posit that a solid 90% of all source code written is done so by single, co-located teams (a substantial portion of which are teams of 1). That certainly fits the bill for most companies I've worked at.
Yes but also no. The thing about software is that 90% of it is not culturally bound. If you're writing, say, some tax reporting tool, a grammar reference, or something religious… sure, it makes sense to write that in your language. So, yeah, C should support that.
However, everything else, from spreadsheet software to CAD tools to OS kernels to JavaScript frameworks is universal across cultures and languages. And for better or for worse (I'm not a native English speaker either), the world has gone with English for a lot of code commons.
And the thing with the examples in that post isn't about supporting language diversity, it's math symbols which are noone's native language. And you pretty much can't type them on any keyboard. Which really makes it a rather poor flex IMHO. Did the author reconfigure their keyboard layout for that specific math use case? It can't generically cover "all of math" either. Or did they copy&paste it around? That's just silly.
[…could some of the downvoters explain why they're downvoting?]
When I was doing a lot of Physics simulation in Julia, I had a Vim extension which would just allow me to type something like \gamma, hit tab, and get γ. This was worth the (minimal) hassle, because it made it very easy to spot check formulas. When you're shuffling data around in a loosely-described space like most of web dev, descriptive function and variable names are important because the description of what you're doing and what you're doing it too is the important information, and the actual operations you're taking are typically approximately trivial.
In heavily mathematical contexts, most of those assumptions get turned on their head. Anybody qualified to be modifying a model of electromagnetism is going to be intimately familiar with the language of the formulas: mu for permeability, epsilon for permittivity, etc. With that shared context,
1/(4*π*ε)*(q_electron * q_proton)/r^2 is going to be a lot easier to see, at a glance, as Coulombs law
Source code, like any other language built for humans, is meant to be read by humans. If those humans have a shared context, utilizing that shared context improves the quality and ease of that communication.
Hrm. Fair point. But will the other humans, even if they have the shared context, also have the ability to type in these symbols, if they want to edit the code? They probably don't have your vim extension…
I guess maybe this is an argument for better UI/UX for symbolic input…
Please show me the keyboard layout that has keys for ⁺, ř and ₚ.
(Unless you're being pedantic because I wrote "keyboard" rather than "keyboard layout", or ignored the qualifying "pretty much". In either of those cases you're unwilling to communicate cooperatively and I can't help you.)
I don't need to do that because I actively use them myself and have a custom ~/.XCompose. Also, please try communicating less condescendingly.
There is no default compose sequence for ₚ that I can find, at least in my Debian installation.
So, again, please point me at the layout that can output these characters.
And even with that: if you don't think Compose sequences, possibly even custom, are covered by "pretty much impossible", I must seriously question your perception & bias of how common (or not) things are.
Mathematics is a language that doesn't fit into ASCII and commonly uses one-character variable names. If you are implementing a documented mathematical algorithm (i.e. one with a description in a paper or book) then sticking to the notation of the paper (i.e. using one character variable names) makes sense to me.
I find math far easier to read when the authors use proper names for variables. But I understand that it isn't the idiomatic style and agree that it can be useful to match the paper when re-implementing an algorithm.
Unfortunately, many of the things of this nature that you’ll want to implement use indices, which are inevitably going to start at 1. So you’ll still got plenty of hours of unpleasant debugging ahead of you, and a non-obvious correspondence to the original paper at the end of it.
My first thought before I saw this was “I wonder is this going to be an article from people who build things or something from “academics” that don’t.”
Isn't that basically all C/C++ code? Admittedly I don't have much exposure to it, but it's pretty much a trope in and of itself, along with Java and C# suffering from the opposite problem.
Such a silly issue too, you'd think we'd have come up with some automated wrangling for this, so that those experienced with a codebase can switch over and see super short versions of identifiers, while people new to it all will see the long stuff.
Source code is for humans, and thus should be written in whatever way makes it easiest to read, write, and understand for humans. If your language doesn't map onto ASCII, then Unicode support improves that goal. If your code is meant to directly implement some physics formula, then using the appropriate unicode characters might make it easier to read (and thus spot transcription errors, something I find far too often in physics simulations).