there are models between 2-grams and 600m param models that would be good options. i don't expect a 2-gram to do very well here. also i'm not sure why this model isn't a fine choice if it solves their problem
As a follow up to the original article, I added a new experiment using Logistic Regression and the results are very good. It actually improves on the accuracy by a few points.