Hacker News new | ask | show | jobs
by henrythe9th 4724 days ago
Thanks for the suggestion. We've actually thought about just writing a multithreaded system on a single machine. What type of in-memory storage would you recommend in this case? (which hopefully may be extended to a distributed cluster of machines if 1 really large machine becomes expensive)

Thanks

1 comments

I suggest storing your data in files and just memory mapping them during start-up. JVM can't memory map more than 2GB per file, so just create logical shards, and map them independently.

Since you will be mostly iterating over all records during your iterative algorithms, storing them in a separate in-memory DB makes no sense (have to call external process via socket).

You can then use a framework like zookeeper/akka for managing nodes in the event that you have to scale out. Even a simple master/slave set-up using thrift services will do.