| HN Mirror

Unfortunately, we don't have much experience with industry sized datasets - simply because we don't have them. I know that our SVMs are among the fastest of the world. At least we beat Libsvm and Liblinear(and there certainly on very big datasets!). But we lack support for hadoop/mpi, even though we would like to change that in the future.

Right now I would say that the main focus of shark is research oriented. That is we want to be fast but also modular so that we can still easily exchange different aspects of the algorithms with our own work. As these goals sometime clash, it is hard to claim that we are the fastest, simply because there is for nearly every algorithm some way to improve when you know exactly which combination of model, loss function and training algorithm you use. But we are (hopefully) reasonably fast and certainly want to improve.