Hacker News new | ask | show | jobs
by snazz 2746 days ago
For accents to be removed, there can't be any ambiguity. For instance, in Spanish, the words cómo and como, or the many forms of "porque" (with very different meanings) are a source of confusion for many speakers. This wouldn't be any easier without accents. I think that a language would have to be designed from the ground up to get rid of accents for this to be possible.
4 comments

A funny thing happened in Slovenian, particularly colloquial Slovenian. We have accents. Many accents. Each vowel has several different pronunciations and sometimes those completely change the meaning of a word. Or they make the text flow better. Or it's an accent thing.

Either way the language has many accents in writing.

But over time, those accents are disappearing. Written Slovenian from the 19th century is absolutely littered with them. Modern Slovenian in colloquial writing is starting to lose even the č, š, ž accents.

Interestingly, people don't compensate with things like cz, or ch, or cx. They rely on context informing the reader how to pronounce a word.

I believe the loss of accents on vowels happened because they're not that necessary. The loss of č š ž is happening because of computers. Takes an extra keypress to type those. On iOS/Android it takes a long press and who has time for that when typing a text? Nobody. So we don't.

Could Spanish not work similarly? Do Spanish people write out all the accents when sending a text?

I know my French girlfriend doesn't always write all her accents and French is also chock full of accents.

Something similar is happening in Vietnamese and I think smartphones are to blame. Writing properly accented Vietnamese on mobile keyboards, particularly the iPhone, is pretty tedious so people often leave out the accents and rely on context to figure out meanings. Unfortunately for a language like Vietnamese which has tons of monosyllable words and where the accent markings completely change the meaning of a word this can lead to a lot of ambiguity.

Why a company with Apple's resources can't be bothered to implement proper autocorrect for a bit market with 80+ million native speakers is a mystery to me.

I could be wrong, but don't people have autocorrect on their phones that will correct the character based on context? Is that even possible?

Lets say somebody wants to say "how I eat" in spanish. The correct way to do it would be "cómo como". "cómo" means how, and "como" means "I eat". I wouldn't make sense to say "como cómo", so therefore, autocorrect should, in theory, feel free to correct all instances of "como como". Only until it becomes an international household brand name will this ever be a problem- for this one phrase at least.

Afaik autocorrect doesn’t work for Slovenian. And even if it does, most people I know have it disabled because our colloquialisms use a lot of English, some German, plenty of Serbocroatian, and sometimes Italian. We often spell those loan words our own way.

This combination of languages and intentionally incorrect spellings makes autocorrect total trash.

"como como un mono" (I eat like a monkey)
There is a trend that confuses this (apparent) simplification trend with "evolution" or "progress".

Don't fool yourself, though.

Accents are there for a reason.

Orthography influences pronunciation. In time people will start pronouncing those words as the orthography suggests rather than deducing it from the context.

Even if only because the context won't be discernible. But, generally, because of the principle of the lesser effort: it's always easier to just read what is there than thinking which pronunciation applies.

Eventually, the words will become homophonous (edit: assuming there are other words which differ only in the accents) - you'll effectively loose the words or they'll change, probably for worse.

The language will become more ambiguous and more dependent on the context knowledge - which will be hard to get if you don't know the language well to begin with.

In other words, you've just made the language "harder" to learn.

So, in effect, it's not a simplification at all.

Orthography influences pronunciation. In time people will start pronouncing those words as the orthography suggests rather than deducing it from the context.

is there any evidence for that?

anecdotal evidence in english for example suggests just the opposite: light -> lite, etc

however learning a language as a child growing up, vs as a second language later are quite different, and the dynamics that affect language change are hence very different too.

http://jbr.me.uk/ranto/m.html explains how esperanto is unlikely to change, and also why that would be a good thing.

back to your argument, i don't think the words with different pronunciation would be lost, but certainly the language would be harder to learn.

English is atypical in its irregular pronunciation rules IMO. At least compared to Latin languages. And it doesn't have accents that change the pronunciation in otherwise similar words.

As such, people are aware that you just have to know how to pronounce every particular word, rather than relying solely on orthography.

Anyway, your example isn't very good: "light" and "lite" are homophonous anyway.

A better one would be "calm". The "l" is almost mute. Presumably, one could "simplify" the orthography to "cam". And you would pronounce "kom" or "kam" according to context.

I claim one of the pronunciations would eventually disappear, sooner or later.

If you're asking for a "scientific study", I don't have one and I don't even know where such a thing can be found.

But the country I'm from has had 3 orthographic reforms in the 20th century. The last one being all about removing supposed "mute" consonants - but which acted like accents in that they altered pronunciation of the word.

Exactly! And I see that typewriters were invented in 1878, so the difficulty to type Esperanto with typewriters was most probably not taken into account when it was invented.
You can potentially replace them with digraphs if the digraphs aren't used for some other purpose, which some Esperantists have done with the x-method, like gxis for ĝis 'until', although some people find that quite ugly.

An interesting example to me is that pinyin uses the diacritical marks to mark tones in Chinese, which can be hard to type on a limited system but also hard for Chinese learners to remember. The Gwoyeu_Romatzyh system has different spellings for each vowel depending on the associated tone!

https://en.wikipedia.org/wiki/Spelling_in_Gwoyeu_Romatzyh#To...

This is presumably harder to learn but easier for learners to remember. Similarly, Finnish uses double letters to mark a long vowel as opposed to the ā, like maa 'country' which other languages might write as mā. On the other hand, are also vowels ä and ö which are different from a and o, so to find a way to spell Finnish without these marks one would need to find some unused digraph, which might actually be a big challenge, since Wikipedia says

> The Germanic umlaut or convention of considering digraph ae equivalent to ä, and oe equivalent to ö is inapplicable in Finnish. Moreover, in Finnish, both ae and oe are vowel sequences, not single letters, and they have independent meanings (e.g. haen "I seek" vs. hän "he, she").

If one wanted to write Spanish without the accent marks, it might be possible to find digraph equivalents, such as maybe ou for ó (which is a problem in "estadounidense" but almost nowhere else!). The ñ could be written with nh as in Portuguese (señor/senhor).

This would work for ñ but it wouldn’t necessarily work for replacing accents unless the vowel followed by u becomes an accent - which leads to the estadounidense problem you identified.

From what I remember from high school Spanish, there is a default syllable that has an “invisible accent” on its vowel in a Spanish word without accents and the purpose of an accent is to change the syllable that gets emphasized.

Yes, we'd ideally need to find a digraph that absolutely doesn't occur in Spanish. This can be tricky with compound words and loanwords. It seems that ou, oe and oo are super-rare in Spanish morphemes but can occur in loanwords and compounds. I just searched for unaccented digraphs that literally don't occur at all in /usr/share/dict/spanish and the only examples (of which there are 184 excluding k and w) contain only consonants and y.

So, there's not any easy natural way to do this without creating at least some ambiguities.

In practice, native Spanish speakers leave off accents in text messages and the like without causing much confusion.
Spoken Japanese, in my very limited experience, runs into this issue due to the number of homophones and the only recourse are context clues to distinguish their meaning. So there can be ambiguity in an otherwise functional language, although it certainly makes it harder.
True, though native speakers do distiguish a good swath of "homophones" by differing pitch accent---e.g. in the standard dialect あめ means "rain" if the pitch drops on め or "candy" if it stays level. To a native speaker, these sound as different as the two ways of saying "present".

If you grab a native Japanaese dictionary that has accent indicators, like 新明解国語辞典, there really are surprisingly few true homophones in a typical vocabulary.

Interesting. Thank you for this.