|
|
|
|
|
by projectant
3163 days ago
|
|
No offence, but with a megabyte of text per speaker I feel I could do better with word n-gram Markov models (even bigrams), and random path choice. I think deep learning is great and all ( and I'm meaning to learn it ) but shouldn't it be able to do far better than Markov models, or other simple things? Image captioning? Incredible. Deep Mind winning video games? Incredible. Style transfer? Incredible. With one exception ( I saw on HN ages ago, sorry I have no link -- it basically generates novel text using deep learning, across all sorts of genres, such as "academic paper", "math paper", "novel", "film script" and I found the results remarkable and interesting ) I question if many text applications are doing better than Markov. I think the issue is there is something fundamental and sophisticated about human language which our current deep learning models, with all their omniscient benevolence ( or whatever ), are missing. There's something deep about the structure of language that we are not modelling yet in deep learning as far as I've seen. When we do .... boom ... computers that learn from the internet and amaze us all. Then we'll have something to shine, smile about or fear. Sorry for the digression and what may be inapplicable comparisons. I can get impassioned about this topic. |
|
I think the secret sauce that's missing from deep learning -as well as any other kind of statistical language model- is a representation of the context outside language itself.
What I mean is, when we humans [1] communicate using language, the language we generate (and recognise) does not carry all of the information that we want to convey. A lot of the meaning in our utterances ...is not in our utterances.
We haven't really found any way to represent this (dare I say) deep context yet. In genearl, in NLP, even the word "context" means the context of a token, in other words the tokens around it. Even mighty word vectors work that way.
The problem is of course that its very hard to even find data to train on, if you want to model that context with some machine learning algorithm. How do you represent everything that a person might know about the world, when they speak or write something?
But- without that deep context, any utterance is just random noise, even if it's structurally correct. So we're left with a bunch of techniques that are damn good at modelling structure, but with meaning, we fail.
___________
[1] We are all humans here, right? Just in case- I love AI! Go robots!