In the past, I’ve said multi-cloud as a method for vendor diversity wasn’t worth the effort. Have COVID-19 related capacity restraints in Azure changed my mind?
One of the advertised advantages of moving to the public cloud includes transforming procurement. The myth, companies would no longer pre-purchase capacity and better distribute capital. IT managers could take a just in time procurement approach to the effectively infinite capacity offered by the hyperscale cloud providers. Worst case scenario, if your preferred cloud provider ran out of capacity, you’d leverage your multi-cloud capability and spin up additional capacity or workloads elsewhere.
Up until the COVID-19 pandemic, this has all been theory. There have been anecdotal stories of customers unable to deploy beyond their quota in some Azure Regions. Analysts have inferred that the significant increase in the use of Microsoft Teams usage has resulted in capacity restraints for Azure. I haven’t heard of similar constraints for other hyperscale cloud providers.
What have we learned?
What does this mean for the procurement strategy for the public cloud? Should enterprises spread risk by leveraging multiple cloud providers? No, we learned that abstracting the supply chain via hyperscale cloud providers doesn’t change the laws of physics.
I believe data gravity makes multi-cloud an impractical solution to managing the supply chain. In the instance of this pandemic, having the capacity available in another cloud provider wouldn’t solve most challenges. Engineers have to weigh the cost of egress data transfer rates and latency between cloud providers with available cloud resources.
There are high-availability designs where customers could failback to their private data center or a different cloud provider. Again, cost and latency are limiting factors. Not to mention, today, the difference in cloud control plans severely impacts steady-state operations. We’ve consistently preached against making significant changes to your operational processes as part of an emergency response. So, unless your services rely on something such as Kubernetes or VMware vSphere, I don’t consider failover a great mitigation strategy for cloud provider capacity constraints.
Are you still determined to leverage multiple cloud providers as mitigation of availability and capacity? Look to abstract your data from your cloud provider’s infrastructure. By placing data in cloud storage platforms that exist in cloud-adjacent co-locations, many IT operations could take advantage of compute from multiple cloud providers. However, this takes considerable planning and isn’t something I’d recommend implementing in the middle of a pandemic.
What’s old is new
So, what have I seen that works? How are some of Microsoft’s largest Azure customers managing capacity during a constrained period? According to Azure’s website, one of the advantages or reserved instances and capacity is prioritized compute capacity in Azure regions. IT leaders I’ve engaged have a strategy to purchase reserved instances for the expected cost savings but also mitigation against capacity constraints during a disaster or emergency event.
The infinite scale of hyperscale providers is indeed infinite until it isn’t any longer. The process of pre-purchasing capacity highlights how the public cloud service delivery model doesn’t always solve traditional enterprise IT challenges. Also, customers should consider separating their cloud storage strategy from their cloud compute strategy. By decoupling your storage from your cloud provider, you begin to build the process and technology muscle memory to pivot compute from one provider to another.