|
|
|
|
|
by tlarkworthy
4722 days ago
|
|
Its random forests ... each tree is trained on a subset of the data. You can split the massive dataset into chunks and train independently. That sidesteps the "big data" hangup. If you look at the implementation for ski-learn, each tree emits a normalised probability vector for each prediction, those vectors are simply multiplied together to get the aggregate prediction, so its not very difficult to do yourself. Although regardless, you are applying a batch learning technique anyway. You want an incremental learner for big data. |
|
Although I'm a big believer in streaming/online machine learning, it's not necessarily the best solution. There are many cases when batch is the better option, especially for big data. Anything historical, really.