| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by verdverm 2709 days ago

Some things I might try...

    1. Hadoop / HDFS / Spark on an ephemeral cluster with disk snapshots
    2. Group 1M ID's into a single file
    3. If analysis is once a month, save daily then prep data right before analysis.
    4. Consider using Cassandra database
    5. Rent a big machine where the data can fit into memory