Hacker News new | ask | show | jobs
by bikeshaving 275 days ago
There are socio-economic reasons why the early computing boom (ENIAC, UNIVAC, IBM mainframes, early programming languages like Fortran and COBOL) was dominated by the US: massive wartime R&D, university infrastructure, and a large domestic market. But I wonder if the Anglophone world also had an orthographic advantage as well. English uses 26 letters with no diacritics, compared to other languages like Chinese (thousands of characters), Hindi (50+ letters), or French/German (latin with diacritics).

That simplicity made early character encodings like 7-bit ASCII feasible, which in turn lowered the hardware and software barriers for building computers, keyboards, and programming languages. In other words, the Latin alphabet’s compactness may have given English-speaking engineers a “low-friction” environment for both computation and communication. And now it’s the lingua franca for most computing on top of which support for other languages is now built.

It’s very interesting to think about how written scripts give different cultures advantages in computing and elsewhere. I wonder for instance how scripts and AI interact, like LLMs trained in Chinese are working with a high-density orthography with a stable, 3500 year dataset.

7 comments

> English uses 26 letters with no diacritics, compared to other languages like Chinese (thousands of characters), Hindi (50+ letters), or French/German (latin with diacritics).

The English language has diacritics (see words like naïve, façade, résumé, or café). It's just that the English language uses them so rarely that they are largely dropped in any context where they are hard to introduce. Note that this adaptation to lack-of-diacritic can be found in other Latin script languages: French similarly is prone to loss-of-diacritic (especially in capital letters), whereas German has alternative spelling rules (e.g., Schroedinger instead of Schrödinger).

The same applies to why China had all the building blocks (pun intended) of the printing press but it was perfected by Gutenberg in Europe, where the number of glyphs was much more manageable.
Indeed. Even if you try to split hanzi into parts it's far more unwieldy (https://en.wikipedia.org/wiki/Kangxi_radicals).
Computer character codes descended directly from pre-computer codes, either teletype or punched card. The advantage holds back through printing to writing itself; having a small, fixed set of glyphs that can represent anything is just better.
I really wonder why Arabic has never gone back to printing. What we think of the Arabic "alphabet" is just it's cursive form. They have an alphabet that is basically just Syriac. Would have been easier to render on low bit displays. wouldn't have to deal with the word initial variants etc.

https://en.wikipedia.org/wiki/Nabataean_script

Same with Japan using mostly kanji when they have a syllabary available (while Korea invented a pretty neat alphabet and largely dropped hanja).
Japanese has (slightly) more homophones and favors monosyllabic Sino-Japanese in compound words. That makes it hard to depend entirely on phonetic script. Same reason why English retains irregular spellings to help with some disambiguation.
We got lots done with 6-bit pre-ASCII encodings, actually, like CDC Display Code and Univac's Fieldata. It's more than enough for 26 letters, 10 digits, and lots of punctuation. And there are faint echoes of these early character sets remaining in ASCII -- a zero byte is ^@, for example, because @ was the zero-valued Fieldata "master space" character, which distinguished EXEC 8 control cards from source code and data cards.
> a zero byte is ^@, for example, because...

A zero byte is ^@ because 0x00 + 64 = '@'. The same pattern holds for all C0 control codes.

Yes, and why is '@' at 0x40?
There must be some alternate universe where WWII never happened, all the talented Hungarian and Polish mathematicians, logicians etc. stayed home, and computer parts and applications carry names like Emlékezet or Wrzeszcz.
No Spanish fascism winning; thus, the Spanish left siding with the Republican France. Nazis gets lots of less support and they get crushed fast, many years earlier 1945. As for Spain itself, it wouldn't suffer a war, postwar and a National-Catholic ruralist shithole regime. No 15-20 years of backwardness compared to France/Europe until 1986 (Spain joining the pre-EU, and the postwar 1940-1950 Spain almost was on par on Europe at 1910-1920... if any, modulo the boost in the 60's because of Tourism), making itself a role model in South America. No polarized left and right in that continent, so they achieve European level standards of living. People merges Iberian humanism with the German engineering.

People like Torres Quevedo happen to exist everywhere because there are no anti-scientific people messing the education to the levels of something coming from the 18th century and before. I am no kidding. Pure creationism with Franco. By law. If you said something against religion, you were either fined, jailed or beaten up.

>early character encodings like 7-bit ASCII

early character encoding was 6-bit ASCII, no lower case

Spanish isn't much bigger...