| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by disgruntledphd2 2179 days ago
	Not the OP, but moving to sparse matrices is probably going to give you the most bang for your buck. I would strongly suspect that those huge dataframes could be encoded sparsely in a much more efficient format. To be fair, that's one of the reasons that Spark ML stuff works quite well. Be warned though, estimating how long a Spark job will take/how much resources it will need is a dark, dark art.