| HN Mirror

This is implied through research on reductions in machine learning.

That simple models can solve complex tasks.

For example, you can do multiclass classification, cost sensitive (importance weighted) multiclass and binary, quantile regression, structured prediction (as is done with HMMs, CRFs, MEMMs, structured SVMs etc.), just using a binary classifier.

So, if your implementation of that binary classifier is efficient and performant, you'll be (given that your reduction is consistent) efficient and performant on any of the above tasks.

What the authors of the paper above did, is that they rediscovered some old tricks, removed the theory of reductions, and that's that - without referencing vowpal wabbit that does way more useful tricks. I'm not sure why, because VW team consistently references Leon Bottou (out of all others) that is member of FAIR, and has been using implementation tricks for decades.

Their log(k) implementation is probably less performant than the one-against-some consistent reduction in VW due to the latter having better theoretical bounds on performance.