Hacker News new | ask | show | jobs
by vagabondvector 3693 days ago
http://arxiv.org/abs/1603.06042

this is the paper.

the idea is based on the incredible incremental perceptron coupled with beam search for having different hypotheses at the same time trying to find a policy which will build them and select the best one

http://dl.acm.org/citation.cfm?id=1218970

The approach to structured prediction that was the fastest at the time was that incremental perceptron and the implementation extremely simple.

Some other approaches includes conditional random fields, hidden markov models, maximum margin markov networks, maximum entropy markov model and the approach they cite called SEARN (learning to search method - DAgger (Ross et al., 2011) and AggreVaTe (Ross & Bagnell, 2014) and LOLS (Chang et al., 2015)).

http://arxiv.org/abs/1502.02206

Although, the syntaxnet team incorrectly dismisses SEARN (implying it suffers from label bias) but it isn't justified.

Their approach approximates 0/1 (hamming) loss with log-loss and works only on problems where the output decomposes over a sequence of decisions where each decision can have that incremental loss.

LOLS can work on arbitrary loss functions. For example, lets say you are translating a text from english to chinese, how can you say what the loss is when you've translated half a document? It's hard if not impossible to decompose the loss over decisions. (It's easy for part-of-speech tagging, the tag is either correct or not, or for dependency parsing, either the parent word is correct or not)

LOLS is a much superior method. It can also be combined with any binary classifier - SVM with any kernel, perceptron, logistic regression, to guide the decisions.

SyntaxNet uses a non-recurrent network to find the best parsing policy and the same network can be used on the LOLS approach.

Problem of structured prediction is incredible and extremely interesting. For example, doing POS tagging and dependency parsing at the same time can increase the performance on both task, same would be accomplished if one recognizes the named entities and at the same time tries to extract relationship between them.

It's very nice to see past insights applied with heavier machinery. Exciting times!