Hacker News new | ask | show | jobs
by vchak1 2359 days ago
Need to understand your domain better, but in many cases, the 250GB csv can be compressed down quite effectively using a columnar representation. And the columns can (potentially) be processed using simd/gpu based approaches to where a single server would outrun a cluster. Food for thought..