Hacker News new | ask | show | jobs
by throwaway-1209 3290 days ago
The whole field of NLP and computational linguistics reminds me of that joke where a drunk is looking for his keys under a street lamp instead of where he actually lost them.

This is true in particular of anything that pertains to reasoning and knowledge representation. People still are trying to "infer rules" and do logical, rather than probabilistic reasoning. I get why that is. To me though, the kind of real life reasoning that humans do seems heavily probabilistic and contextual, Bayesian almost. And there's next to no notable work going on in that direction.

2 comments

>> People still are trying to "infer rules" and do logical, rather than probabilistic reasoning. I get why that is.

That is because it's very hard to collect statistics on something that you can't really quantify- meaning, in this case.

There was a thread on HN a couple of days ago about a blog post where someone was experimenting with, among other things, training an LSTM network to generate Java programs [1]. In one example, the LSTM did really well in reproducing the structure of a Java program, with import declarations, followed by a class implementing an interface with a few methods with structured comments and throws declarations and everything- and even a test!

On the other hand, this program was completely useless. From a cursory glance it would probably not even compile (e.g. it refered to undeclared variables etc). There was one method named "numericalMean()" that took a single double and returned an (undeclared) variable "sum". The class had a nonsensical name - "SinoutionIntegrator". The test was testing something called "Cosise", presumably a method- but not one defined in the class. In short- a mess.

That might sound a bit harsh, but I think it's a very good example of why statistical NLP is really bad at doing meaning: because there is nothing, not a shred, of meaning in examples of the data we use to train statistical models of language, i.e. text.

Because, you see, the relation between meaning and text (and even spoken language) is completely arbitrary. Or, to put it in another way, there are potentially an infinite number of valid mappings between structure and meaning, of which we, human beings, somehow by convention or some other crazy mechanism, have agreed to use just one. And even though the various forms language entities take (inflections etc) are used exactly to convey meaning, right, the rules of how meaning varies with structure are, again, completely independent from structure itself.

Now, we have done very well in modelling structure, from examples of it (which is what text is). But it's completely unreasonable to expect our algorithms to be able to extract meaning from it also.

And that is why people are still trying to put down the rules of meaning by hand. Because that's the only way we can think of, currently, to process meaning automatically.

________

[1] https://news.ycombinator.com/item?id=14526305

I don't think these two things are mutually exclusive.

As far as I'm aware there is work underway to take logical constructions and integrate them with probablistic machine learning to do things like force zero probabilities in impossible input cases. That is encoding domain knowledge into the model directly in the form of symbolic reasoning.

I mean even Bayesian nets require some encoding of causality​ right? Maybe I'm reading to much of "blah symbolic reasoning is worthless" in your comment?

It's not worthless, per se, it's just not a precursor to AGI in any shape or form, no matter how much the researchers pretend otherwise.
Worth reading maybe?

http://reasoning.cs.ucla.edu/fetch.php?id=136&type=pdf

Abstract:

> We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.

Still working on my understanding but Professor Darwiche gave a lecture on the material in one of my classes. Salient bit:

> The problem we tackle here is that of developing a representation of probability distributions in the presence of massive, logical constraints. That is, given a propositional logic theory which represents domain constraints, our goal is to develop a representation that induces a unique probability distribution over the models of the given theory.