Hacker News new | ask | show | jobs
by microcolonel 3283 days ago
Sometimes I get i18n fatigue too. I think the world would be a better place if everyone's languages fit in ASCII.

That said, the cat's kinda out of the bag. UTF-8 is at least well-done, and the algorithms are widely available. I study Japanese and have started studying Russian and Chinese; I think maybe the best way to convince people to learn English is to walk the walk. Who knows, maybe everything will go very wrong again before we get a chance to standardize.

I'm also working on an engineered language with a test suite/corpus maintained alongside the language. Maybe in the ashes of the old new world there'll be room for something like this.

2 comments

English doesn't even fit in ASCII.

To write it properly, we need left- and right-facing single and double quotes, diareses and accents for words like naïve, façade and café, en- and em-dashes and the ellipsis.

Longer documents will require symbols like † and ‡, bullets and §. The currency symbols £, €, ¢ and ₹ are used by countries where English is an official language.

I can't even use symbols like that anyhow (I deal in USD, CAD, and NTD). I end up using ISO 4217 codes everywhere. You missed ¥ as well, for which I would use JPY or CNY.
And English speakers need to talk about things from non-English-speaking countries sometimes!
Somedays, I dream of a world where the ancient chinese were exposed to alphabaeic scripts and decided that was a good idea, instead of sticking with characters.
Though honestly, I kinda like ideographic languages. If the symbols could be enumerated in a byte and leave space for delimiters and punctuation, then I'd be down to be globally colonized by an ideographic language. Really the only difficult ones (for computers) are abugidas, abjads, and whatever thai is written in (and I suppose it is a bit of a pain to compose hangul, and still more jamo than would fit in an ASCII-sized encoding).
Thai is an abuguida, descended from Brahmi just like most of the rest of the South and South-East Asian scripts. It's probably the one with the most additional stuff to consider, but fundamentally it's the same kind of script.