They are moving batch jobs to GCP. Because there probably the occasional burst loads and the often underutilized capacity makes using a flexible compute fabric (aka cloud provider) a better choice.
If they are moving everything, well, Netflix did that too. Maybe opportunity cost, maybe having your engineers work on your core product instead of running VMs is cheaper altogether.
uh correct me if I'm wrong but all CDN's that Netflix purchases also need to have a large storage cache backing them right? Meaning each CDN Netflix uses for local caching also requires a colocated datastore to circumvent their centralized-bandwidth issue.
You're not wrong per se but Netflix takes a similar route to Google Global Cache in that they provide the hardware and place it inside the networks of other ISPs etc etc. So it's a CDN in the sense its a distributed content delivery network but not in the sense that they just use large traditional CDN providers.
Netflix provides massive storage boxes to ISPs that serve content from within the network of the ISP the user is connecting from. This can save the ISP a lot of external traffic so they generally want to do this to save costs and meet customer demands. YouTube does a similar thing.
I don't think Netflix puts their content on CDNs. A prerequisite for Netflix entering a country is whether AWS has a datacenter in that country (or for smaller countries, near that country). For example, Netflix only offered services in Australia when AWS opened an Australian datacenter. If Netflix were distributing their content via CDNs, then it wouldn't matter so much where AWS datacenters are, it would only matter where the CDN edge nodes were. I suspect that Netflix has far too much content to host it economically on a CDN.
The reason Netflix would potentially wait for a proximate AWS datacenter is because all of their apps, backend services, and interface UIs are served from EC2 instances; all of the actual content delivery is in fact handled by their FreeBSD-based OpenConnect appliances. In other words, no, Netflix doesn't put their content on other, third-party CDNs like Cloudfront, Fastly, Limelight Networks, etc., but they do absolutely serve it all from their own, custom-built CDN/hardware.
not sure about GCP, but for other cloud providers Hadoop clusters are reserved hourly and don't actually work well for saving money on batched computes. This is due to Hadoop clusters requiring physical data colocation to meet performance needs (i.e. avoid non-rack-local maps) - even if you were to come up with a by-the-hour compute payment mechanism, you would need a by-the-long-term data storage mechanism that could persist to the point that you could spin up co-located compute capable of operating on that storage... not nearly a trivial problem
I'm guessing Twitter runs their own infra for the really heavy throughput stuff. They're big enough that I'd imagine economies of scale kicked in a long time ago.
It sounds like they moved stuff like logging/metrics and query systems onto GCP, which makes sense because as another poster said utilization is probably bursty.
Indeed, these workloads are bursty. They also tend to involve running lots of different processing frameworks over the same data, which makes the value proposition for separating compute and storage stronger.
This thread, and site, tells the tale. Google is using this for marketing purposes. They already have some trite 2 minute marketing ad running as well. The deal that Twitter received is going to come with extensive incentives unavailable to typical operations. Twitter also was likely able to play the typical providers against each other. 'Oh, well AWS is offering me [x,y,z]. Can you do better?'
If they are moving everything, well, Netflix did that too. Maybe opportunity cost, maybe having your engineers work on your core product instead of running VMs is cheaper altogether.