Hacker News new | ask | show | jobs
Most of the World Can't Code
10 points by jayathra 452 days ago
Programming is only accessible to those who understand English and the Latin alphabet.

If you don’t, your chances of becoming a programmer drop drastically - not because you lack intelligence, but because everything from syntax, documentation, and debugging tools is built in English.

Why is coding still tied to a single language? Shouldn’t anyone, regardless of their native script, be able to write Python in Japanese, Arabic, Sinhala, or Hindi - while keeping full compatibility with existing ecosystems?

Has anyone here faced or thought about this problem? What do you think the biggest challenges would be?

11 comments

This is by no means unique to programming. Many areas of knowledge are less accessible to those who don't speak English, and much more so to those who don't speak any of the dozen major languages. Because of this, many people will simply learn (enough) English. It's not such a big deal.

In my view, having a single lingua franca is nice. It better facilitates knowledge transfer. I wouldn't want to see a fracturing where each area of knowledge (or, say, every specialization/application programming) is best treated in a distinct linguistic community. That would be bad for everyone.

I see your point - having a global language certainly helps with knowledge transfer, and English has become that standard for programming. But is learning English really 'not a big deal' for everyone? For someone with limited access to quality English education, it could take years before they’re comfortable enough to learn programming effectively. Meanwhile, a fluent English speaker can start coding right away.

Rather than fragmenting knowledge, what if we had a system that let people write and learn code in their native script, while still maintaining full compatibility with the existing programming ecosystem? Similar to how Unicode enables multiple languages on the web without breaking global communication. Do you think that could work?

https://en.wikipedia.org/wiki/ALGOL_68#Example_of_different_...

Algol 68 offered that, for keywords at least. Variables, procedure names, and the contents of strings can't be localized quite the same way (though localization for content is easier now than it used to be if you don't embed the string text directly in the source).

If we switched from a text file based representation of code to a different structure, localization could be performed more easily for source code even down to comments and variables. However, this would help you to work on a project I started, taken too far it would not help us work together (we'd refer to the same variables and procedures with different names). We'd still need to select a common language when collaborating.

Other languages, today, at least work well with unicode source but they retain English-based keywords (Go, Rust, probably others but they like to tout it specifically).

Localizing everything could make collaboration harder if different people refer to the same function by different names.

But what if we had a system where people wrote code in their native script, and it automatically translated into a universal format when shared? Would that help keep things accessible while maintaining collaboration?

That's what I suggested, but it means our person-to-person collaboration becomes more challenging, though if we work at arms length with each other it might be more feasible. You always work in your translated code, I always work in mine. But direct communication about code breaks down because we need to have a translator and decide on the common terms.

We could probably collaborate at a higher level (system architecture and design). I'm building a compiler, I can share the architecture with you and have you implement a particular optimization pass. But if I want to read your optimization pass code and discuss it with you then one of us has to learn the same code twice: once in our native language and once in whatever is decided to be the shared language.

Store the code in a database (or something akin to one) and use a structured editor and this mode is technically feasible. It would open up work for people who are not native in the original language, but you also need to ensure that code has a translation. So you're still going to need someone (or something) to do the localization as well.

OSS projects can't afford this, but commercial efforts might be able to. On the other hand, commercial projects can afford to be the $2/hour extra (companies are cheap bastards) to hire those English speakers in your country and ignore the non-English speakers.

There are loads of programming languages which have nothing to do with the English language.

Assembly:

   LDA #$01
   STA $0200
   LDA #$05
   STA $0201
   LDA #$08
   STA $0202
Brainfuck:

   >>,[>>,]<<[
   [<<]>>>>[
   <<[>+<<+>-]
   >>[>+<<<<[->]>[<]>>-]
   <<<[[-]>>[>+<-]>>[<<<+>>>-]]
   >>[[<+>-]>>]<
   ]<<[>>+<<-]<<
   ]>>>>[.>>]
Oh, you meant easy to learn programming languages based on a real language? Yeah, English just happens to be one of the easier languages to learn and if you need to learn a programming language you can just as well learn English on the side. I did.
I’m curious—do you think learning English is equally easy for everyone? Many programmers come from regions where English education is either poor or expensive. If someone is highly logical but struggles with English, should that be a barrier to learning how to code? Also what was your level/accessibility to education of English before you learned it? Did you start from scratch? Did you know a language that shared a root language with English?
I literally learned English reading the only computer books that were available at the time, mid 80s. That was before I had a single English class in school - in fact I already knew English pretty well by the time I started studying English on 7th grade. I come from Finland and the Finnish language has absolutely nothing to do with any other language (except Estonian), no words are even remotely similar to English.

Also I totally suck at learning languages. I've tried to learn Swedish, nope, German, nope, Spanish/Portuguese, also nope.

Ja jos mielestäsi suomenkieli liittyy johonkin muuhun kieleen niin ihan vapaasti voit ajatella niin. Viime viikonloppua yritin opettaa muutamia suomenkielen sanoja ja taivutusmuotoja kielenopettajalle joka jaksoi kuunnella noi puoli tuntia ja totesi etta "mahdotonta oppia koska mitään referenssiä muihin kieliin ei ole".

That’s impressive. But do you think your experience is typical or more of an exception?

What if coding was built from the ground up to be script-agnostic—where people didn’t need to 'learn English on the side' at all?

If a Finnish speaker could code in Finnish while collaborating with a Japanese speaker coding in Japanese — and the system translated everything seamlessly — do you think that would increase access to programming without fragmenting the codebase?

The Finnish version of Excel used to have all of the functions translated to Finnish (in late 90s when I was doing Excel for living). It was literally impossible to do anything with it as none of the documentation knew about the translated functions and it was very hard to self-translate the functions (as in there was mostly no logic in the Finnish function names thanks to the way Finnish language works).

What was supposedly done in good faith to make it easier for non-English speaking Finns to do Excel functions ended up making it impossible for everyone. If you didn't know Excel then =IF() was just as cryptic as =JOS() and if you did know Excel then you couldn't figure out why =IF() didn't work. At least .xls files were compatible because apparently functions were saved as opcodes and not as strings.

I haven't used non-English software since so no idea if Finnish Excel still has translated fuctions. Hope not.

Assembly instructions are English mnemonics. LDA->LoaD Accumulator, STA->STore Accumulator, ADD, SUB, JMP, MOV, etc.
Exactly! Even in low-level programming like Assembly, the core instructions are still based on English. Do you think there’s a way to design programming languages that don’t rely on English at all—not just in keywords, but in how concepts are structured?
APL. I also doubt Erlang (Ericsson's language) was based on English.
APL and possibly Erlang aren’t based on English, yet they never became the global standard.

Do you think it was because of technical reasons, or was it just easier for English-based languages (C, Python, JavaScript) to spread globally? If we designed a non-English programming system today, do you think it would actually gain adoption?

We've had localized programming languages: Visual Basic for Applications in Germany had "prüfe" instead of "select" (yes, with Umlaut): https://de.wikipedia.org/wiki/Visual_Basic_for_Applications?...

Today German Excel still doesn't accept "XLOOKUP", but insists on "XVERWEIS". On input, that is, it silently converts languages when opening .xlsx files.

But do you think they failed because they were localized per country, rather than designed for global compatibility from the start?

Silent conversions could definitely be messy, but what if there were a standardized system that allowed programming in any script while keeping everything interoperable? Could that avoid the pitfalls of past localized languages?

> silently converts languages when opening .xlsx files.

Sounds like a bug waiting to happen

I used to use excel in school in hungarian and at home in english.

I don't think there were bugs in opening files, but I sure got some useless error messages when I used the wrong language!

everything is a bug waiting to happen… :)
One option is to adopt a front end for your language for existing (legacy) langauges that have been overwhelmingly developed by people who write using Latin letters: you type in your language and the front-end maps it to Latin letters. There are various issues here, depending on the written form of your language, however. Latin letters are effectively block type glyphs, which lends itself to programming.

Other option is for future languages to be formally specified in a globally adopted IL and then your local area geeks are responsible for writing a front-end that transpiles to that IL.

Or we could design and adopt a universal (~visual) glyph for programming. Various structural elements (think [ ], { }, < >, etc.) are pretty much that already. Then we have the (pseudo) mathematical elements (+, -, /, =) which are again universal. That leaves us with named elements which remain somewhat problematic.

In any event, all this seems to be a transitional period's grief. Very soon, you will interact in your native language with some AI and that thing will write the actual code. :)

Regardless (thinking of music notation here) programming notation is ultimately a specialized form of notation. Are you bothered by the fact that a musician in x-land has to learn the notation invented by some Europeans way back when?

I love the idea of a universal intermediate language (IL) with region-specific front-ends—that could be a great way to make programming more accessible without fragmenting the ecosystem.

But with AI handling more code generation, how important will it be for people to truly understand the underlying code? Do you think AI will make coding more of a black box, or will there always be value in knowing how things work under the hood?

Music is a great comparison—eastern music notation exists in native scripts, and western pieces can be translated into it. Could programming work the same way, where the structure remains universal, but the notation adapts to different languages?

TIL - I did not know about eastern music notation. (Thanks!)

When I was young I had a vision of future programming as people in front of screens moving colorful shapes and forms (not talking visual programming here) to make 'harmonious' forms. :) The general idea being that (imo) AI is a misnomer and there is something 'special' about human intelligence. So that vision, when I tried to interpret it later, seemed to map out to something along the lines of 'aesthetic choices' on a meta-level. That is the 'thinking' machine 'state' was represented as images to humans and they made aesthetic choices, with man and machine each doing what they excel at.

But back to present reality, there is little doubt that over reliance on these tools will cause skill atrophy and at some point there will be a knowledge and comprehension disconnect between the operator of the tool and the artifacts created by it. This is likely already true for many beginners who are cranking out software using LLMs, but the overall field hasn't yet experienced it since the experienced software engineers already know and understand the code being generated; they are just using it to amplify their output. But they (imo) gained that knowledge due to years of hands on practice.

This is a really delayed reply, so I'm hoping you'll see this.

> people in front of screens moving colorful shapes and forms

along those lines...

> (not talking visual programming here)

...even though you wrote that, I'd like you to check this out:

https://blockstud.io/tutorial/0

There are, of course, programming languages that have non-English keywords, they're just not very popular.

I guess if you're learning all of C/C++/Java/Python/etc... the "English" keyword meanings are a tiny/trivial part of what you need to learn anyway.

Also, using English means you only need ASCII, and a US keyboard layout which allows easy entry of the printable ASCII characters. For Japanese, Arabic, etc... you need need Unicode, input methods, UTF-8 / UTF-16 etc., all of a sudden there's a whole lot more to go wrong than if you use English in ASCII.

ASCII and a US keyboard do keep things simple. But given that modern systems already support Unicode everywhere (from websites to filenames), do you think the complexity argument still holds today?
When I was very young in the 1950s I remember my older brother opting to learn German because so many scientific papers were written in it.

That said, I later opted for Latin for reasons neither I nor the examiners could explain.

Do you think we’ll always be stuck in this cycle, or is it possible to design programming languages that aren’t tied to any one language, preventing this issue in the future?
APL and J aren't based on English. Several countries have developed local language-based programming languages that have simply stayed local. Global adoption of programming languages will be affected by global language, amongst other factors. English is currently the global language. Your guess is as good as mine for how long that will remain the case, but so long as English is the global lingua franca, then programming will largely be done in English.
English is the global standard now, and that’s shaped programming as well. But what happens if, in the future, another language (like Hindi or Mandarin) becomes dominant? Would the programming world have to shift again?

If programming were script-agnostic from the start, we wouldn’t have to constantly adapt to shifting global languages. Instead of relying on English or any single language, shouldn’t we explore ways to make programming more accessible to all scripts from the ground up?

I think having a lingua franca have way more advantages than drawbacks. As a spanish native speaker, I know is easier for me that for someone whose first language doesn't come from latin, but is really not much of a hassle.

Even coming from one of the most used languages in the world (and having a just enough english level), is very rare that I search of read something programming related in other language than english.

For Latin-based language speakers like Spanish, Italian, or French, learning programming in English might not be a huge hurdle. But for people whose native languages use completely different scripts (Chinese, Arabic, Sinhala, etc.), the challenge is much greater.

You mentioned that you rarely search for programming-related content in Spanish. Do you think that’s because English is simply better suited for programming, or is it more about the lack of high-quality programming resources in Spanish?

If programming had built-in support for multiple scripts while keeping a universal structure, do you think more people would use resources in their native language?

Even if I can learn programming concepts in spanish or any other language, I will need to use english to communicate with my peers. Building a jargon for concepts that I cannot communicate is a lost of time. No one will understand me (maybe not even other spanish speakers) if I refer to a stack as a "pila". So yes, even with high quality content in other languagues, I stil be refering to english version.

That said, if in the future that status quo changes, and chinese dominate the programming world...well, then I'll be reading in chinese (and I will have a hard time doing so)

I was programming BASIC and Pascal as a non-English-speaking kid and the Latin-based syntax was not a problem at all. I think you underestimate how far sheer drive can get you. That said, I do agree that basic understanding of English is more or less a requirement these days if you want to stay up to date on modern developments. If not, well, many languages have enough translated materials to get by.
That’s amazing! I have visited certain parts of the world where the access to resources are minimal in terms of learning English. I don't know the specifics of your story/background but especially when the native language doesn't involve the Latin alphabet at all, the learning curve is steeper. A fluent English speaker can begin coding immediately, while a non-English speaker has to learn two languages at once.

Even if basic materials exist in other languages, most advanced documentation, debugging tools, and libraries remain in English. Do you think that creates a significant disadvantage for non-English speakers?

Well, in my experience lack of English was more of a small bump rather than having to learn a whole new language. The number of keywords used was fairly limited (if, for, while, begin, end, goto, etc) and not that difficult to memorize. I certainly didn't come out of it with some newfound understanding of English language. It did help that English (and Latin alphabet) is so pervasive in the society that words are easy to sound out even without knowing the particulars. If programming in Hindi or Mandarin was the standard, a lot of us would be out of luck :)

I do agree with your last point - majority of documentation is indeed in English. I learned programming before Internet, with little access to books (they existed, but were hard to find), and mostly relied on translated help files. Growing up these days, I'd definitely be soaking up English to be able to navigate all the available information - feels like a fair trade-off! I do think it's convenient that there's a lingua franca, so to speak, in software dev - there's enough variation in programming languages that it's nice not to have to deal with additional fracturing along spoken language lines.

You could also remove the text from the programming "language", like in this research:

https://dl.acm.org/doi/10.1145/3173574.3174196

Thanks for sharing this! The idea of removing text from programming languages is really interesting, especially for making coding more accessible to people with different language backgrounds.

The paper you linked explores text-free programming through visual programming tools, which is one possible solution. But what do you think about making existing text-based languages, like Python or JavaScript, script-agnostic instead of moving entirely to visual coding?

Would love to hear your thoughts on whether a hybrid approach — where code can be written in any script but still remains text-based — could work.

(Apologies for the delayed response!)

> making existing text-based languages, like Python or JavaScript, script-agnostic

What might that look like? There's non-obvious challenges to merely translating the keywords into a foreign language (gender, declension, etc.)

If you're deeply interested in this topic, I'm happy to connect! (hn username at goog's mail service dot c o m).

I don't think the language itself being English is an issue at all because there's only 50-100 keywords per language, maybe. It's all the code comments and documentation/manuals/discussion that can be the issue.
What do you think is a more realistic solution? Should we focus on improving translations for documentation and learning materials, or is there a way to make programming itself less dependent on a single language?
The universe gave us AI which translates pretty well (and soon to be perfect). There was no one seriously looking to address this, it was just magically addressed when these LLMs showed up.