| I work in HPC for a cloud provider, and fully endorse this move. Anonymously, of course. You can make an economic argument for or against cloud in practically every IT domain, but in HPC the case for on-prem is really compelling; none of the cloud networking/resiliency value-add is relevant to batch workflows, and costs per core-hour are only remotely comparable if you use spot - which is itself a major compromise. The only real advantage cloud has for science is object storage, which is genuinely a much better idea than trying to manage your own long-term archival storage. If I were independent I would recommend people buy and build on-prem clusters and shuffle data out of fast scratch into Glacier, but other than that just don't worry about cloud until price pressure kicks in and we are down to 1-2 cents per core-hour on-demand. I'd love a role where I can say these things non-anonymously, but the salary for such a position would be at least 50% lower than working for a cloud provider. Keep that in mind when talking to your supplier - we may not believe the pitch ourselves, but making it is just part of the job. |
As someone who has done a fair bit of HPC I consider the real advantage to be temporary scalability. If my 'normal' compute notes have 128 GB of RAM and all of a sudden I have job that need 300 GB or RAM, with cloud I can just change a line in a config file and run that calculation on a machine with 300 GB of RAM. Or if I have a job that will optimally run on 100s of 1-core machines with only 4 GB of RAM I can set up a cluster of such machines with in minutes.
That being said I 100% agree that if you have a normal baseline workload that should absolutely be done on in house hardware.