There are important changes coming to the Microsoft Dynamics 365 Field Service Mobile App. Read this blog to learn more!
The move from on-premises to Azure brings its fair share of paradigm shifts to traditional architecture design, even if adoption of Azure services is not in the plan. Virtual Machine (VM) availability, or uptime, is one area that can cause a lot of issues if not planned for ahead of time, leading to rework and possible downtime. In this three-part blog series, we will look at why availability planning is so important, as well as the tools Microsoft provides to ensure maximum uptime can be achieved. Let’s get started…
Uptime guarantees, which Microsoft refers to as the Service Level Agreement (SLA), almost always involve redundancy. Redundant power, networking, ISPs, and even cooling in some cases are all standard fare in high availability environments. At a VM level, common wisdom recommends an N+1 host deployment to ensure there is enough compute power to handle the load in the case of a single host failure. VMs then can be moved to different hosts to make sure they stay running.
This begs the question: doesn’t Microsoft handle all of that in Azure? Well, yes. And also no. Availability falls into the category of what Microsoft refers to as shared responsibility, meaning that both Microsoft and the customer have a part to play. It is shared because host level redundancy may not be available. Microsoft will handle all the power and networking, but if a host fails, the VMs on that host will not necessarily move to another host to get spun up again.
This is where the paradigm shift comes in. Redundancy in Azure – and all major hyperscale clouds for that matter – is best handled at the application layer, not at the host layer. This does not mean that the legacy application never written for application layer redundancy cannot move to the cloud – it just means that an understanding of the configuration options is needed to ensure the correct architecture design is used.
What it all boils down to is the disk. While that may be slightly oversimplified, the disk selection is indeed where it all starts. There are 4 tiers of disk in Azure: Standard HDD, Standard SSD, Premium SSD, and Ultra Disk. The higher the tier, the higher the SLA and, as you might expect, the higher the price tag. The SLAs are as follows for a single VM:
For VMs with mixed disk tiers, the SLA from the lowest tier applies.
While 95% for a single VM running HDD may not seem very reliable, but Microsoft originally offered no SLA for VMs with that configuration, so 95% is quite an improvement. Still, it is unacceptable for most production workloads.
A simple solution could be to always use Premium or Ultra disk. If the performance is needed, then that is an obvious choice. Also, if the application cannot be configured with application failover, this may be the only solution. However, in the case of a machine with low disk utilization – web server or domain controller – it may be adding unnecessary cost.
As a quick example of the cost difference, compare an S (Standard HDD) disk to the equivalent P (Premium) disk. An S10 (128 GB) is around $6/month. A P10 (128 GB) is around $20/month. Not much of a difference, but at scale this will add up quickly. The move to larger disks can make the cost delta significant. An S50 (4 TB) is around $164/month, but a P50 (4 TB) is around $495/month.
A better solution for applications that do understand application redundancy (like IIS or SQL servers) will be Azure Availability Sets or Availability Zones. These configurations allow for an increased SLA over standalone machines that is not dependent on the disk configuration.
Understanding the differences, and what must be planned for to take advantage of these configurations will be discussed in the next posts of this three-part series.