| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by x0x0 4772 days ago

you should check out http://0xdata.com/ ; it's built from the ground up on a custom dkv to do in-memory ML. Reasons to check it out:

1 - it's open source https://github.com/0xdata/h2o

2 - ingest data from hdfs, s3, csv

3 - I've built systems like what you're discussing twice; the ML algorithms are often easier to write than expected while data management (moving data, sending updates, etc) which initially seems easier is much harder. 0xdata handles this for you.

4 - under active development

5 - it cleanly runs on your dev box with 1 or many nodes for development; deploying is a simple as uploading a jar to a cluster and putting a single file on each naming peers in the cluster

5a - see scripts to walk you through doing this

disclosure: I work on it as of very recently =P