Hacker News new | ask | show | jobs
by azeirah 1144 days ago
I'm following the discussions on GitHub as well as their PRs closely.

The primary bottleneck for now is compute.

They've recently made a big improvement to performance by introducing partial gpu acceleration if you compile with a gpu accelerated variant of BLAS. Either cublas (Nvidia) or CLBlast (slightly slower but supports almost everything: Nvidia, Apple, AMD, mobile, raspberry pi etc)