|
|
|
|
|
by x0x0
4724 days ago
|
|
you should check out http://0xdata.com/ ; it's built from the ground up on a custom dkv to do in-memory ML. Reasons to check it out: 1 - it's open source https://github.com/0xdata/h2o 2 - ingest data from hdfs, s3, csv 3 - I've built systems like what you're discussing twice; the ML algorithms are often easier to write than expected while data management (moving data, sending updates, etc) which initially seems easier is much harder. 0xdata handles this for you. 4 - under active development 5 - it cleanly runs on your dev box with 1 or many nodes for development; deploying is a simple as uploading a jar to a cluster and putting a single file on each naming peers in the cluster 5a - see scripts to walk you through doing this disclosure: I work on it as of very recently =P |
|