Hacker News new | ask | show | jobs
by opportune 2704 days ago
not sure about GCP, but for other cloud providers Hadoop clusters are reserved hourly and don't actually work well for saving money on batched computes. This is due to Hadoop clusters requiring physical data colocation to meet performance needs (i.e. avoid non-rack-local maps) - even if you were to come up with a by-the-hour compute payment mechanism, you would need a by-the-long-term data storage mechanism that could persist to the point that you could spin up co-located compute capable of operating on that storage... not nearly a trivial problem