| HN Mirror

> IMO RNNs do need some kind of structured loss (more than per step likelihood) to be competitive with HMM approaches using Viterbi decoding

This is exactly what they do with the dependency parser I've cited, so your opinion is definitely valid. Although their approach is not general, given the fact that they approximate hamming loss with log-loss and again make it work only on sequences.

http://arxiv.org/abs/1502.02206

paper above also has a very good analysis on how to remove the search component of the inference and allow linear time complexity with competitive results.

consistent (as it is used in the machine learning theory of reductions) reduction from structured learning to multiclass classification seems to be possible. I just haven't seen anyone couple the learning procedure with neural networks. (Daume did mention they trained RNNs with the reductionist approach but seems that the code didn't make it to vowpal wabbit).

the approach above works with any loss you want (from F-score to any weird thing you might think of), the loss doesn't have to decompose over the structure (one can just announce the loss after the labelling is done and learn from that loss), it can work on any kind of structure, from images to sequences to documents for translation. it can also use a O(log n) consistent reduction of multiclass classification if speed is of the issue and if number of classes is large. It can easily work as an online method too, not requiring the full structured input.

for example, simple sequence tagging works (depending on the number of possible labels) around 500k tokens per second :D word count is only 2-4 times faster than that :D

there still aren't any papers using the above consistent reduction in the framework of NNs but I guess they'll soon be coming.