Hacker News new | ask | show | jobs
by barneso 3928 days ago
It seems that MLDB would be a decent fit for this use-case. You would be able to do pre-processing in the background continuously, and predictions could do a significant amount of work on-demand. Depending upon the size of the overall training set, you might need to spin up a larger server for an hour or so to retrain a model... but if you set the system up right, the model would only need to be trained infrequently as most of the work would be done online. The more preprocessing you can do in the background, the richer and smaller the data that would go into the training phase.

MLDB can memory-map some kinds of datasets which would also help with the low memory-to-datasest size ratio.

Please feel free to reach out (jeremy at datacratic) if you'd like to discuss further.