Hacker News new | ask | show | jobs
by canjobear 2157 days ago
> A very basic Markov model can come up with content that seem surprisingly like a human would say.

This is false. Natural language involves long-term dependencies that are beyond the ability of any Markov model to handle. GPT-2 and -3 can reproduce those dependencies reliably.

> If anything, what all of the OpenAI hype should confirm is just how predictable and regular human language is.

Linguists have been trying to write down formal grammars for natural languages since the 1950s. Some of the brightest people around have essentially devoted their lives to this task. And yet no one has ever produced a complete grammar of any human language. So no, human language is not predictable and regular, at least not in any way that we know how to describe formally.

1 comments

W.r.t. the Markov model, I just mean that something even that trivial can sound lifelike. It's not surprising that throwing billions of times more data at the problem with more structure can make the parroting better.

> So no, human language is not predictable and regular, at least not in any way that we know how to describe formally.

I don't know what to say about this other than perhaps the NLP community has been a little too "academic" here and I disagree.

Grade schoolers routinely are forced to make those boring diagrams for their particular language, and that has tremendous structure. When you add that structure (function) with the data of billions of real-world people talking, it's not surprising that the curve fit looks like the real thing. Given how powerful things like word2vec have been that do very, very simple things like distance diffs between words, it's not surprising to me that the state of the art is doing this.

It is surprising! You could throw all the data of the entire human race at a Markov model and it would not sound a tenth as good as even GPT-2. Transformers are simply in a new class.