|
|
|
|
|
by Radim
3108 days ago
|
|
Getting better obviously, but the feet-on-the-ground experience for MLlib is still far from pleasant: hard to configure, hard to manage, hard to scale, hard to debug. By way of anecdote, Spark's MLlib used to contain an implementation of word2vec that failed when used on more than 2 billion words (some arcane integer overflow). So much for scale! As for performance, in 2016, the break-even point where a Spark cluster started being competitive with a single-machine implementation was around 12 Spark machines (a bit of a hindrance to rapid iterative development, which is the corner stone of R&D): https://radimrehurek.com/florence15.pdf |
|