|
>> Training with loads of more data isn't a viable long term substitution. Depends. In principle, you can't learn an infinite language from finite
examples only and you need both positive and negative ones for super-regular
languages. Gold's result and so on. OK so far. The problem is that in order to get infinite strings from a human language,
you need to use its infinite ability for embedding parenthetical sentences:
John, the friend of Mary, who married June, who is the daughter of Susan, who
went to school with Babe, who ... But, while this is possible in principle, in practice there's only a limit to
how long such a sentence can be; or any sentence, really. In practice, most of
the utterances generated by humans are going to be not only finite, but
relatively short, as in short "relative" to the physical limit of utterance
length a human could plausibly produce (which must be something around the
length of the Iliad, considering that said human should be able to keep the
entire utterance in memory, or lose the thread; and that the Iliad probably
went on for as long as one could stand to recite from memory. Or perhaps to
listen to someone recite from memory...). Obviously, there are only a finite number of sentences of finite length, given
a fixed vocabulary, so _in practice_ language, as spoken by humans, is not
actually-really infinite. Or, let's say that humans really do have a generator
of infinite language in our heads, but an outside observer would never see the
entire language being produced, because finite universe. Which means that Chomsky's argument about the poverty of the stimulus might
apply to human learning, because it's very clear we learn some kind of
complete model of language as we grow up; but, it doesn't need to apply to
statistical modelling, i.e. the approximation of language by taking statistics
over large text corpora. Given that those large corpora will only have finite
utterances, and relatively short ones at that (as I'm supposing above) then it
should be possible to at least learn the structure of everyday spoken
language, just from text statistics. So training with lots of data can be a viable long term solution, as long as
what's required is to only model the practical parts of language, rather than
the entire language. I think we've had plenty of evidence that this should be
possible since the 1980's or so. Now, if someone wanted to get a language model to write like Dostoyevsky... |
Everything you said applies to computers too. Real machines have physical memory constraints.
Sure the set of real sentences may be technically finite, but the growth per word is exponential and you don't have the compute resources to keep up.
Information is not about what is said but about what could be said. It doesn't matter so much that not every valid permutation of words is uttered, but rather that for any set of circumstances there exists words to describe it. Each new word in the string carries information in the sense it reduces the set of possibilities from prior to relaying my message. A machine which picks the maximum likelihood message in all circumstances is by definition not conveying information. Its spewing entropy.