Hacker News new | ask | show | jobs
by verdverm 2709 days ago
Some things I might try...

    1. Hadoop / HDFS / Spark on an ephemeral cluster with disk snapshots
    2. Group 1M ID's into a single file
    3. If analysis is once a month, save daily then prep data right before analysis.
    4. Consider using Cassandra database
    5. Rent a big machine where the data can fit into memory