| HN Mirror

Cross validating the classifier/hyper parameters and a good scoring metric (Matthews correlation coefficient) go a long way. Since the classes are very imbalanced, an appropriate scoring metric is very important. Even more importantly, train with lots of high-quality data whenever possible. Anecdotally many seem to obsess over the particular classification algorithm, while neglecting data quality. A classifier is only ever as good as its training set.