|
|
|
|
|
by IIAOPSW
1182 days ago
|
|
You're digging your heals in on a rehash of a model from the 40s, glibly dismissing the problems it doesn't account for bought up by linguists in the 50s and 60s as if they are unaware that babies go through a period of babbling. The amount of time spent acquiring language is already priced in and not enough to account for as pure reward and training. >Are you sure we don't simulate in our head what would happen if we drove the car into the lamp post / brick wall / other car / person, etc.? You left out the 10k times part. You're ignoring the huge training data sizes these models need even for basic inferences. No, I don't think it takes all that much full scale simulation to distill car speed as a function of pedal parameters, and estimate the control problem needed. In many instances, humans can seemingly extrapolate from far less data. The algorithms to do this are missing. Training with loads of more data isn't a viable long term substitution. |
|
Depends. In principle, you can't learn an infinite language from finite examples only and you need both positive and negative ones for super-regular languages. Gold's result and so on. OK so far.
The problem is that in order to get infinite strings from a human language, you need to use its infinite ability for embedding parenthetical sentences: John, the friend of Mary, who married June, who is the daughter of Susan, who went to school with Babe, who ...
But, while this is possible in principle, in practice there's only a limit to how long such a sentence can be; or any sentence, really. In practice, most of the utterances generated by humans are going to be not only finite, but relatively short, as in short "relative" to the physical limit of utterance length a human could plausibly produce (which must be something around the length of the Iliad, considering that said human should be able to keep the entire utterance in memory, or lose the thread; and that the Iliad probably went on for as long as one could stand to recite from memory. Or perhaps to listen to someone recite from memory...).
Obviously, there are only a finite number of sentences of finite length, given a fixed vocabulary, so _in practice_ language, as spoken by humans, is not actually-really infinite. Or, let's say that humans really do have a generator of infinite language in our heads, but an outside observer would never see the entire language being produced, because finite universe.
Which means that Chomsky's argument about the poverty of the stimulus might apply to human learning, because it's very clear we learn some kind of complete model of language as we grow up; but, it doesn't need to apply to statistical modelling, i.e. the approximation of language by taking statistics over large text corpora. Given that those large corpora will only have finite utterances, and relatively short ones at that (as I'm supposing above) then it should be possible to at least learn the structure of everyday spoken language, just from text statistics.
So training with lots of data can be a viable long term solution, as long as what's required is to only model the practical parts of language, rather than the entire language. I think we've had plenty of evidence that this should be possible since the 1980's or so.
Now, if someone wanted to get a language model to write like Dostoyevsky...