|
|
|
|
|
by disgruntledphd2
2179 days ago
|
|
Not the OP, but moving to sparse matrices is probably going to give you the most bang for your buck. I would strongly suspect that those huge dataframes could be encoded sparsely in a much more efficient format. To be fair, that's one of the reasons that Spark ML stuff works quite well. Be warned though, estimating how long a Spark job will take/how much resources it will need is a dark, dark art. |
|