Hacker News new | ask | show | jobs
by whym 3380 days ago
This article does a great job at presenting a gist of the Japanese sentence structure. Nevertheless, it makes me want to point out that it's not the whole story. If you take into account topics such as modality and conjugation, some of the information you add to a verb is placed after the verb and cannot be freely reordered.

Japanese verbs are "greater" than English verbs in the sense that you conjugate/suffixate a verb to express negation, conjunctions, conditional forms etc, making it longer and longer: https://en.wikipedia.org/wiki/Japanese_verb_conjugation

In contrast, English has a relatively simple set of inflections of verbs. Many of those Japanese verb forms and suffixated long verbs are translated into multi-word phrases. Compare:

Anata wa kyou nemuru. (You sleep today.) -- verb is in normal form ("nemuru")

Anata wa kinou nemurenakatta. (You were not able to sleep yesterday.) -- verb is in continuative form ("nemuru"→"nemu") + possibility suffix ("reru"→"re") + negation suffix ("nai"→"naka") + past suffix ("ta"→"tta")

6 comments

The name for what you're talking about is "morphological typology". Languages occupy a spectrum from analytic (words stay the same) to synthetic (words change). English is usually categorized as analytic, since we only have a few ways to change words: plural -s, past tense -ed, etc., and English has been getting more analytic over time. Other European languages are more synthetic (fusional), like French, German, Spanish, etc. Japanese, Finnish, Hungarian are very synthetic (agglutinative). Chinese (all varieties) is on the opposite end of the spectrum, and it's much more analytic than English.
Interesting. Considering language as a lever and a tool for thought, how do you think the position of a language on the analytic/synthetic spectrum affects its suitability for computer interpretation?

Are there any major Japanese programming languages? It would seem to me that given our relatively primitive compilers, that 'analytic' languages offer simpler mediums for unambiguous programming, whereas 'synthetic' languages may bear greater nuance of expression when we can 'think' programs into existence in the future.

I advise extreme caution when going down this path... it's rife with established lines of pseudoscientific thought. For example, a couple centuries ago you could talk about how English just isn't complex enough to express great ideas the way Latin and Greek are.

Ambiguity is an orthogonal concept to morphological typology, however. There is no extra "nuance of expression" just because you conjugate words more, that's just nonsense.

There's one pretty famous and you may already know it.

> Ruby is a language of careful balance. Its creator, Yukihiro “Matz” Matsumoto, blended parts of his favorite languages (Perl, Smalltalk, Eiffel, Ada, and Lisp) to form a new language that balanced functional programming with imperative programming.

> He has often said that he is “trying to make Ruby natural, not simple,” in a way that mirrors life.

> Building on this, he adds: Ruby is simple in appearance, but is very complex inside, just like our human body.

These will not be considered strictly scientific articles, but appear reasonable..

[http://carlosplusplus.github.io/blog/2013/08/01/ruby-and-the...] - [https://robots.thoughtbot.com/learning-japanese-the-rubyist-...]

>Interesting. Considering language as a lever and a tool for thought, how do you think the position of a language on the analytic/synthetic spectrum affects its suitability for computer interpretation?

Interesting question. I remember reading article many years ago in BYTE magazine, someone, maybe one of the regular columnists who wrote about programming languages, speculating that if a programming language came out of the Orient, it might be different in some interesting ways from existing ones, which mainly came from the West, I suppose, though of course there could have been Oriental contributors to existing ones. For a long time after that I never heard of any such language; then came Ruby.

Well, APL is damned near polysynthetic, if that's your bag. But like Cree or Ojibwe, you won't really "speak" it fully, fluently and with nuance until adolescence.
Ruby comes from Japan
> English is usually categorized as analytic, since we only have a few ways to change words

Words can be stressed in many different ways in an English sentence, and the pattern of stresses in a sentence determines much of the meaning in the surrounding context that isn't communicated by word inflections such as -s and -ed. In programming languages, this would be equivalent to marking up identifiers in different ways (using bolding, italicizing, underlining, etc) instead of using prefix, suffix, and circumfix syntax operators and punctuation.

For Japanese, you learn a handful of rules to conjugate a verb. Then those rules always apply with no exceptions. In English, there are not set rules. Every case is special.

What is the past tense form of 'shake'? 'see'? 'walk'? 'sleep'? 'eat'? 'speak'? 'sit'? 'seek'? 'work'?

There are no rules. You basically have to memorize every word and all the possible ways it can morph.

English possessive: 's or s' or ' or s, depending. Japanese possessive: no

English plural: different for every word. Japanese plural: same as singular, or throw on a -tachi

English past tense: different for every word. Japanese past tense: -mashita for verbs, deshita for adverbs/adjectives

In English, nothing is simple. In Japanese, a multi-word phrase may have more syllables, but at least it will always be the same rule.

You're forgetting that there are only 3 (three) verb conjugations in English, of which two are almost always the same. Only a finite number of verbs have irregular conjugations, so you just learn them along with vocabulary. In Japanese, the number of possible conjugations of all irregular verbs (copula, "suru", "kuru") is probably larger than the number of English irregular verbs that are commonly used. In fact, let's count the number of conjugations of "kuru" in my ICHIRAN [1] database:

    ICHIRAN/DICT> (length (get-kana-forms 1547720))
    186
Are English possessives considered difficult by anyone? Not sure what that demonstrates.

Plurals! Oh, that's my favorite topic that I'm working on right now. -tachi is mostly used with people, so can't be used in most context. For inanimate objects you just say the number of them. And that's where the counters come in... At which point any sane person gives up learning Japanese for good.

Past tense, isn't that the same as conjugations? Also your rules don't really work. "tanoshii" => "tanoshiideshita"? Pretty sure that's not a word. The correct past tense is "tanoshikatta [desu]".

[1] https://github.com/tshatrov/ichiran

We have counters in English too!

Tons of words are uncountable, like water, bread, and so on.

A slice of bread, a loaf of bread, a bread roll (Hey, why did that one come after the 'bread'...)

We even have lots of words that are both countable and uncountable. "I ate some tomato" and "I ate some tomatoes" has quite different meaning.

Overall I think all languages have their foibles, and trying to hold one widely used natural language up as "More regular" or "more difficult" is a pretty fruitless endeavour. Thought it is fun to talk about ;)

> A slice of bread, a loaf of bread, a bread roll (Hey, why did that one come after the 'bread'...)

This doesn't seem that unusual to me, all things considered. "Bread", as a word, is more of a substance-noun than a discrete object-noun.

Moreover, "slice" and "loaf" don't strike me as words which give meaning to the phrases "slice of bread" or "loaf of bread"- in fact, it's the other way around. For instance, "slice" is the primary noun, and "bread" is just meant to distinguish it from other "slices" (e.g. "slice of pizza").

So, when I say "Pass me two slices of pizza", I'm really saying "Pass me two 'slice-of-pizza's", rather than "Pass me 'two-slices' of pizza".

You're somehow comparing 3 verb conjugations in English vs. 186 for "kuru". Well, I hate to break it to you, but there aren't 186 conjugations for "kuru". There are, exactly, 6. 9 if you count the formal/archaic forms[1]. There may be 186 forms you can build with auxiliaries, but then, you'd have to compare to all the variants you can have in english with may, can, shall, etc.

1. https://ja.wikipedia.org/wiki/%E3%82%AB%E8%A1%8C%E5%A4%89%E6...

Edit:

> Past tense, isn't that the same as conjugations? Also your rules don't really work. "tanoshii" => "tanoshiideshita"? Pretty sure that's not a word. The correct past tense is "tanoshikatta [desu]".

'tanishiideshita' is the kind of mistake you make when you're not taught that, in japanese, adjectives conjugate. Sadly, a lot of material glosses over that fact.

Similarly, most material for non-natives like to talk about the -masu form, then describe things as "-masu form without masu" (sigh).

Cumulating "knowledge" from such material, you end up with simplified rules like in GP, which work in some cases, but don't in many others.

Then when you dive more into the language, you either encounter new forms and consider them as such, and are crushed under the sheer number of forms, or have to basically start over, deconstruct what you learned and realize that, in fact, it's all much simpler and structured than what you thought, and what made it all more complex is all the learning material for beginners.

In some ways, it's like maths.

Coming back to the 186 forms for "kuru", I'm sure you only end up with that because of that same learning material "limitations". So you probably end up counting "konai", "konakatta", "konakute", "konakereba", and many other forms as forms of "kuru", when, in fact, they are one form of "kuru" with variants of "nai".

The same material will e.g. also tell you about "-kunai" for the negative form of "-i adjectives", but fail to mention that it's actually "-ku+nai", which explains why you will find forms like "-ku ha nai" or "-ku mo nai". I've never seen those explained in textbooks, but that I'm sure it's not pretty.

No, that's not counting auxiliaries (except "-masu"). The "explosion" comes from the fact that many conjugations can themselves be conjugated (e.g. "konai" can be conjugated as an i-adjective). I don't think katsuyōkei should be counted as conjugations because they're not words by themselves. Like, obviously a godan verb has 5 possible root endings but that doesn't mean it has 5 conjugations.

If we're counting auxiliary verbs, my system can recognize more than 4000 verb/adjective endings.

>Also your rules don't really work. "tanoshii" => "tanoshiideshita"?

I learned this quite early on, but I'm still a little confused for exactly when to conjugate desu instead.

For me, it helps to think of verbal ("-i") adjectives as the same thing as a verb. Or rather, that they're both just predicates about the topic.

tanoshikatta is literally a predicate stating "was-fun", and desu is just a formality afterwards to make it polite.

this differs from the other kind of adjective ("noun adjectives"), which can't conjugate themselves, so you need desu to change instead.

Of course, the even more polite form is "tanoshuu gozaimashita" (which comes from tanoshiku gozaimashita), but even then it seems to me to be the same form as tanoshikatta if you accept that the latter could've derived from "tanoshiku atta".

I'm not a linguist though, so I do not know if the above ideas are correct, but it's the way I understand Japanese verbs.

The main problem beginning Japanese learners often face is that they are taught polite form before plain form. Polite form is a natural extension of plain form, but if you start with that, it's actually quite mind bending to back track to plain form. The secret is to abandon polite form entirely until you are relatively fluent with plain form and then add polite form back in.

For example, "tanoshii" is present/future tense. "tanoshikatta" is past tense. If you want to make it polite, then you just add "desu". Super easy.

While it is grammatically incorrect, it is completely acceptable in normal conversation to do the same with the negation. "tanoshikunai" is the negation. Past tense negation is "tanoshikunakatta" (ye gods, I can't read romaji...). You can do exactly the same thing to make it polite -- just jam "desu" on the end. That's what every child will do. The wrong bit is that "tanoshikunai desu" should really be "tanoshiku arimasen".

For "na" adjectives, it works differently. "suki" is present tense. To make it polite: "suki desu". Past tense is "suki datta". To make it polite "suki deshita". Negation is "suki de wa nai" (seriously, romaji makes me cringe...). Polite negation is "suki de wa arimasen" (though you can very much get away with the mistake of saying "suki de wa nai desu" -- again, every single child speaks this way).

Past tense negation is "suki de wa nakatta". Polite is "suki de wa arimasen deshita" (but again, the easy way is "suki de wa nakatta desu").

So, why is it like this? The reason is that "i" adjectives were originally verbs that had a different set of inflections/conjugations. Very obscure piece of trivia (that most Japanese people don't even know) is that "ohayou gozaimasu" is actually one of those conjugations -- it's actually "(honourific) o hayai de gozaru" in polite form. The "i" ending mixes with "de" to produce the "ou" ending. Anyway, the point is that you have to inflect it because it is literally a verb that is modifying a noun.

"na" adjectives on the other hand are actually adjectives. They are called "na" adjectives because you have to add "na" when modifying the noun. For example, "suki na hito". The "na" is actually a contraction of "ni aru" -- because in Japanese you can only modify nouns with verb phrases.

So this is why there is a difference between the negation of "i" adjectives and "na" adjectives. "ku" is the verb combining form of the old style "i" verbs (like "te" is on modern verbs). So "tanoshikunai" is really "tanoshiku nai" -- you are combining the "tanoshi" verb with the "nai" verb. On the other hand "suki" is actually an adjective, not a verb, so you have to say "suki de wa nai" -- you can't combine them.

Past tense is exactly the same. In "tanoshikunakatta", it's really combining 2 verbs and conjugating the last one (as per the rules" -- "tanoshiku nakatta"). If you want to make it polite, the polite past tense of "nai" is "arimasen deshita" (but you can get away with "nakatta desu" in virtually every situation).

With "na" adjectives -- "suki de wa nakatta", we've conjugated the only verb. Again to make it polite you can say "suki de wa arimasen deshita" (or "suki de wa nakatta desu" if you want to sound like an uneducated bumpkin like me).

Hope this helps! Avoid polite form until you can handle plain form and it's almost all completely logical ;-)

Edit: Fix past tense in the examples of incorrect, but acceptable polite forms.

> The wrong bit is that "tanoshikunai desu" should really be "tanoshiku arimasen".

While it should technically be -ku arimasen, it's actually rarely used, and -kunai desu is more "mainstream".

> Past tense is "suki datta". To make it polite "suki deshita". Negation is "suki de wa nai" (seriously, romaji makes me cringe...). Polite negation is "suki de wa arimasen"

Trivia: all these forms are really variations of "suki de aru". "datta" comes from "de atta", "de ha nai" is really "de nai" with a "ha" for emphasis. "de ha arimasen" is really "de aru", with the "ha" for emphasis, and "aru" conjugated with the "masu" auxiliary at the negative form.

> Very obscure piece of trivia (that most Japanese people don't even know) is that "ohayou gozaimasu" is actually one of those conjugations -- it's actually "(honourific) o hayai de gozaru" in polite form. The "i" ending mixes with "de" to produce the "ou" ending.

Technically speaking, the -i and the de are not combining at all. The -u form (ウ音便) comes from the -ku form (連用形), where the k is removed. Then the preceding sound also changes (like in arigataku -> arigatou ; oishiku -> oishuu, etc.). The typical forms used in keigo are -u gozaimasu and -u zonjimasu (where there is no "de" to combine in the latter ;) )

I think the Japanese learn the ウ音便 in 国語 or 古文, so I don't think it's some obscure trivia that few people know. In fact, you can hear it in e.g. 時代劇 dramas.

Also, the form is pretty common in Kansai dialect (without gozaimasu). In fact, wikipedia claims[1] it comes from there and the gozaimasu was added in Kantou.

1. https://ja.wikipedia.org/wiki/%E9%9F%B3%E4%BE%BF#.E5.BD.A2.E...

> The main problem beginning Japanese learners often face is that they are taught polite form before plain form.

I often heard that the rationale for doing so is that learners don't sound impolite. It kind of makes sense, but on the other hand, I think it's also part of the "日本語が上手ですね" problem, that is, for the Japanese, the polite form is rather advanced.

Anyways, I do agree that for learners it would all make more sense to start from the basics and learn the polite forms later. Textbooks tend not to, though, sadly.

If I could upvote something multiple times, it'd be this post. The conjugations are well documented and can be intuited with enough exposure, but the insight as to why (い-verbs) and common usages (お早うございます) is something clearly lacking from the literature.
Counters exist in English.

"5 head of cattle" is exactly analogous to "ushi go tou" (牛五頭)

Most English counters have left common usage but the concept exists. Similar but not the same many people enjoy learning all the names for groups of animals I English. "A murder of crows" for example

Oh, I just remembered another fun feature of Japanese verb conjugation.

Can you conjugate the (regular) verb "へる" (heru)? There happen to be two verbs that are pronounced "heru" but with completely different conjugations! Do you remember which one is which? In fact any verb that ends with -eru or -aru can be potentially conjugated like an ichidan verb or a godan verb. For each such verb you have to remember its conjugation class, lest you conjugate it completely wrong. So these aren't so regular after all.

Also a related topic is intransitive/transitive verb pairs. These are also heavily irregular, and you must use the correct one in a sentence or it will be ungrammatical.

I think this is overstating the regularity of Japanese a bit. For example, what is the past tense of "kuru"? What is the present tense of "datta"?
Yes. It is overstating Japanese regularity a very little bit.

And understating the irregularity of English an enormous amount.

My Japanese's pretty rusty, so corrections welcome.

IMO you're not "overstating Japanese [verb][0] regularity a very little bit", but severely overstating it. The ichi-dan / go-dan verbs are indeed quite regular, and though there are irregular verbs like the grandparent post's example, IIRC there aren't actually that many (three?). However, that's not the only "axis" one could conjugate Japanese verbs. The complicated stuff are the transitive / intransitive forms, and the compound verbs. While some rules apply, they are more like the "I before E, except after C" English spelling rule.

It would be akin to saying that there are only three verb tenses in English: past, present, and future; simple, right? While it's technically not wrong, it's glossing over the progressive, perfect, and subjunctive forms (maybe there are more, don't remember), and their combinations.

[0] Hopefully from context it's clear we're only talking about grammar pertaining to Japanese verbs, and not Japanese in general. Japanese grammar has plenty of quirks, like how some colors are adjectives (red: akai, blue: aoi) but others are nouns (green: midori, purple: murasaki), or how loan words are usually written in katakana, but tobacco is not, etc.

The thing with English is people in general can understand most things even if you screw up tenses or tones. You put a few words together and people will mostly get you. In Japanese you have to remember extra things to add so you identify different parts of a sentence. Same with gender in romance languages. They just add more complexity. In a way Korean and Chinese are a little easier in terms of rules.
You should definitely see Turkish. it's belong to same family with Japanese. But verbs are even bigger. There are more variants as suffixes to the verbs. Sometimes a whole English sentence can be translated into one Turkish word. Have a look. "okutamadıklarımızdansınız" just one word. It means you are one of those who we can not make them read.
If you meant "family" in linguistic sense, then, no. Japanese is of Japonic languages family, and Turkish is of Turkic. The proposed Altaic family (which was a parent to both, in its expanded "Macro-Altaic" version) was discredited, and Japonic is generally classified as an isolate language family.
"can not make" or "cannot make"?
> makes me want to point out that it's not the whole story

Surely that's the whole point of the "8020" in the URL?

Personally I thought it was excellent, compact, understandable introduction. Sure it leaves things out, but if it didn't it would be a textbook!

The conjugation of japanese verbs are non negotiable - it has to be there for you to make any sense, and I don't think OP was claiming that you can move(?) how a verb is conjucated - just that the subject and object order doesn't matter (which is correct). The tense of the verb or conjucation of the verb has nothing to do with that.
Why do you use nemuru as the root? Mostly I would hear and say neru. So I could not sleep would become nerarenakatta.
寝る(neru) and 眠る(nemuru) are different verbs. Different kanji in writing, own conjugations.

Neru tends to indicate more intentional sleep (arranging to go to sleep: deciding to retire to bed, to lie down etc), whereas nemuru is spontaneous sleep (falling asleep --- and not necessarily lying down).

"Nerarenakatta" has the possible interpretation of something like, "I couldn't sleep (because of no opportunity to get away from activity and lie down)" whereas "nemurenakatta" is "I couldn't fall asleep" (insomnia).

I think if you stayed up all night studying and so because of that you couldn't sleep, that's when you might best use "nerarenakatta". I couldn't sleep (because I needed to do something else with the time, not due to failure to fall asleep).