Hacker News new | ask | show | jobs
by throwaway_bob 3631 days ago
Note that the fact that this can be easily accomplished in VW doesn't really take away from the message the authors are trying to make; namely, simple models done carefully are nearly as (or more) effective in these sorts of problems as fancy deep models, but much cheaper to train and test.
2 comments

I can attest to that: used even simpler algorithm with bi-gram hasing to generate user profile for http://news-AI.com presonalized news service.

Surprisingly, it produced satisfactory results with much smaller CPU requirements.

This is implied through research on reductions in machine learning.

That simple models can solve complex tasks.

For example, you can do multiclass classification, cost sensitive (importance weighted) multiclass and binary, quantile regression, structured prediction (as is done with HMMs, CRFs, MEMMs, structured SVMs etc.), just using a binary classifier.

So, if your implementation of that binary classifier is efficient and performant, you'll be (given that your reduction is consistent) efficient and performant on any of the above tasks.

What the authors of the paper above did, is that they rediscovered some old tricks, removed the theory of reductions, and that's that - without referencing vowpal wabbit that does way more useful tricks. I'm not sure why, because VW team consistently references Leon Bottou (out of all others) that is member of FAIR, and has been using implementation tricks for decades.

Their log(k) implementation is probably less performant than the one-against-some consistent reduction in VW due to the latter having better theoretical bounds on performance.