Hacker News new | ask | show | jobs
by griffzhowl 263 days ago
Oh, true, it was just a mistake to exclude verbs. Of course they should be vocabulary.

But I think of pronouns as grammatical, as well as the auxiliary particles in verb forms like "there is", "to go to", etc. So "have" and "is" can function grammatically when they're part of the verb form of another root verb, like "have been seen" and so on.

"Do" is obviously semantic when it's the main verb, e.g. "I'm doing my job" versus "I'm leaving my job". In the selection you quoted it's also playing a grammatical role which is just to point to the main verb form of the sentence, i.e. it could be replaced by repeating "get exposed to (titbits from)" without changing the meaning of the sentence.

So in "there is also a lot of influence from French", I would put "there _ also a _ of _ from _" as grammatical.

I'm sure my way is naive, but it's based I think on well-established categories. I'm not sure how linguists would distinguish grammatical words or even if they categorize based on words at all. e.g. "a lot of" as a quantifier might be completely grammatical, same as "more", "less", "thirty", etc.

1 comments

> I think of pronouns as grammatical, as well as the auxiliary particles in verb forms like "there is", "to go to", etc.

There would not usually be considered a particle. It is a noun, but one that has no semantics whatever; it is there only to satisfy the grammatical rule requiring the verb in that clause to have a subject. (The term of art here is, straightforwardly enough, "dummy subject".)

You could ask questions about extraposition (as in "it's tragic that XXXXX", which is equivalent to "that XXXXX is tragic"); "there is [noun]" is obviously similar in some ways and less similar in other ways. One way in which it's gotten less similar over time is that the verb used to agree in number with [noun], but today it is more commonly always is, appearing to agree with there regardless of whether [noun] is singular or plural.

> "Do" is obviously semantic when it's the main verb, e.g. "I'm doing my job" versus "I'm leaving my job".

I don't think this is so obvious. Do (as a primary verb) is a verb in the same way that thing is a noun - it has all the same grammatical properties, and usually no semantic content. (Technically, since we have two meaningfully distinct classes of noun, we need more than one empty noun. The counterpart to thing is stuff. These do technically differ in their semantics, conveying the speaker's idea of how divisible the objects or materials in question are.)

In your example, I would say that doing is closely related to job and the semantics (still pretty weak) arise from the pairing. You can do many things by taking advantage of conventional fixed expressions. But if I were to remark to you that my friend was "doing a book", I suspect that you wouldn't know what that meant. Maybe my friend is an author. Maybe he's an illustrator. Maybe he's an editor. Maybe he's a press. Some words are vaguer than others; do is maximally vague.

> I'm sure my way is naive, but it's based I think on well-established categories.

Mostly, yes. Adverbs can be a bit hazier than nouns, verbs, and adjectives. You did yourself a big favor by defining a miscellaneous "other" category.

I will note that I excluded more (in more tidbits, but not in more exposed where it's an adverb) from the semantic category on the grounds that it is a determiner (same part of speech as the). This is something I think you might not have anticipated. I should also note that also is an adverb (adverbs are very broadly defined), so your methodology rated it as semantic. I think I rated it as 70% grammatical.

Prepositions are difficult to deal with. (This is generally true of almost every language.) For there is a lot of influence from French, my view is the following:

(1) From has fundamental semantics involving something being in a certain location and then moving out of that location;

(2) in this specific use, those semantics are close to the surface. A foreigner putting this phrase together would likely be able to guess that from was the right preposition to use.

Contrast something like refrain [from], where the semantics are still not entirely gone, but the foreigner is going to have a much harder time.

I didn't want to think very hard about exactly how much the semantics were present in prepositions, so if I thought they were present in a nontrivial way, I gave them 50%.

> "a lot of" as a quantifier might be completely grammatical, same as "more", "less", "thirty", etc.

I had a lot of trouble with thirty and ended up scoring it as an adjective for the unprincipled reason that that would make it count as semantic. Grammatically the least we can say is that it's not a normal adjective. This is also true of more and less (where we can say more), so good eye.

"A lot [of]" is heavily grammaticalized and this process appears to be continuing. Here's a blog post observing that native speakers often think of "a lot" as a single word: https://hyperboleandahalf.blogspot.com/2010/04/alot-is-bette...

It's not quite the same thing as more and less, though. They can substitute for it:

A lot of the students...

More of the students...

But it can't substitute for them:

More students...

*A lot students...

This problem won't go away if we include the of; then we'd get

*More the students...

I think it's better not to include the of.

> I'm not sure how linguists would distinguish grammatical words or even if they categorize based on words at all.

Linguists use word to mean an atomic element. Exactly which parts of a certain stretch of speech are atomic depends on the analysis you're trying to do, and linguists have explicit terms for elements that are atomic at different levels or in different ways. By default a "word" would probably be taken to mean a lexeme, which is something that requires its own dictionary entry. A "morpheme" is something like "the smallest element to which we can assign independent significance" and might rarely be considered a "word". At this level you might observe that "fascinate" derives from Latin but its -ing ending, a separate morpheme, does not. A "phoneme" is a sound that is meaningfully distinct from other sounds, and would never be called a "word".

There is a concept of a "clitic", which is something that behaves like an independent word in some ways and like a dependent particle or inflection in other ways. This is almost always a lexeme that is pronounced as if it is part of a nearby word. I don't know of a term for "pronunciational atom", but I wouldn't be surprised if there is one.

Linguists make all kinds of observations about how certain words are semantically weak or in the process of losing their semantics ("semantic bleaching"). And of course they also make all kinds of observations about grammatical rules. So "how grammatical is this word" is definitely a question close to the heart of linguistics, but as you note the concepts are difficult to define and the question often cannot be answered rigorously as applied to particular words.