Hacker News new | ask | show | jobs
by nrinaudo 4305 days ago
Japanese is often encoded in Shift_JIS, which is much better than UTF-8 (for japanese text). Most browsers default to that encoding on japanese OSes, so there probably isn't a real bandwidth issue, depending on the ratio of browser to dedicated client usage.

As for ideograms, your statement is not correct. Kanjis allow for better content/character ratios, certainly, but one kanji is very often not one word - the word foreigner, for example, uses 3 ideograms.

On top of that, Japanese is not written solely with kanjis (as opposed to Chinese, for example). It also uses katakanas and hiraganas, which stand for phonems. This is more often the case on social networks where a lot of western words are used - western words are almost systematically written with katakanas.

Kanas are still more "efficient" than alphabets, but to reuse my previous example, foreigner is written using 5 kanas instead of 3 kanjis.

2 comments

I know it's such a tiny point, and very much off topic, but please don't "pluralize" words such as Kanji, Katakana, etc. As someone who has spent a lot of time learning Japanese, and who knows that Japanese words don't change between singular/plural (with certain exceptions, such as attaching "-tachi" to a word), seeing you write "Kanjis" or "Katakanas" so many times in a row really bothers me.

I'll go back to my corner and leave you alone now.

Well, he's speaking English, not Japanese. Do you refer to multiple pizzas as pizze?
It's more like the plural of 'deer' being 'deer'. I don't think I've ever heard anyone attach an 's' to 'kanji' for pluralization when speaking English. Like how it might be odd to say 'sushis' or 'wasabis'. Whenever I need to stress plurality i would say 'kanji characters' or something like that. Oxford English Dictionary lists 'sushi', 'kanji', 'shinkansen', 'katakana' all as being mass nouns or having the plural form the same as the singular. The exception in the words I looked up was 'tsunami' which may be pluralized as 'tsunami' or 'tsunamis'.
Most of those are the sorts of nouns that wouldn't normally be pluralized in English. "Sushi" is like "rice" — specifies what the roll is made of rather than the roll itself. We don't pluralize the name of a rail system like Shinkansen because there is only one of it (similarly, "the L" but not "the Ls"). But there are many Japanese loanwords that are commonly pluralized differently in English. For example, futons, tycoons, typhoons, tatamis, ninjas and kimonos.

It's ambiguous whether "kanji" is a mass noun referring to the character set as a whole or a singular noun referring to a character in the set. I think it's both. So it seems hard to blame someone for being unclear on the matter.

Eh, English has a long history of pedants insisting that the original pluralization of loanwords be used. If people are going to push for indices instead of indexes, octopodes instead of octopuses, and rooves instead of roofs, there's no reason not to use the Japanese pluralization of loanwords when appropriate.
Actually, you're wrong. The plural of "kanji" in English is "kanji": http://www.merriam-webster.com/dictionary/kanji
That was not the basis of FreezerburnV's complaint, which was based on how it's done in Japanese, not English. So I don't believe SunShiranui's point is wrong. The fact that the plural of "kanji" is "kanji" in English does not establish a blanket rule that every word must be pluralized according to its language of origin. (And that is good, because pluralizing "cherry" would be a nightmare! It's an over-singularized form of the already-singular French "cherise.")
I apologise, I was not aware you could not pluralise these words. My only excuse is that english is not my first language.

I'd go back and edit my comment, but that'd just make yours seem weird...

English has many anomalies and plural forms are often strange. That said, in the texts I've read, they did not add an 's' to kana, kanji, katakana or hiragana, but used them as their own plural forms.

I wouldn't have guessed that you were not a native speaker, though. You write well.

Just for information, be aware that you should not write 'softwares' or 'informations' either. Those aren't words.

So is twitter sending Shift_JIS encoded responses back? That seems unlikely given it seems easier to engineer accepting anything but always send utf-8 back, but definitely interesting if that's the case.

Also, why downvote? Even if was largely wrong, the discussion engendered was a net positive to HN.

I think twitter generates answers based on the client's Accept-Encoding header. Well, I assumed that. Come to think of it though, it does seem unlikely and you make a good point.

I did not downvote, or if I did I did not mean to, op's comment certainly did not deserve it.