In this video we show you how Microsoft’s Power Platform automates and simplifies business processes, interacts more intelligently with data, builds conversational interfaces – all within SAP.
In Part 1 of this three-part blog series, we discussed Azure availability of a single Virtual Machine (VM). This post will expand on the topic by reviewing Availability Sets and how they can provide an increased Service Level Agreement (SLA) without the need for Premium disks.
Before diving in to the why and how, we need to clearly identify what is needed, and that begins with understanding how Microsoft manages the cabinets in their data centers.
In Part 1 of 3, the concept of N+1 hosts was briefly discussed, and it comes in to play here, as well. As a refresher, N+1 simply means that there are enough hosts in an environment that one host can be completely shut down without affecting the performance or availability of the VMs in the environment. Microsoft, obviously, has many more hosts than just +1 and all those hosts need to be maintained. This can involve hardware refreshes, software updates, or any of the other myriad maintenance tasks that make getting out of the data center business and moving to Azure so enticing.
Microsoft manages the logistics of host availability by dividing them into domains. These are not AD domains; they are linked to the underlying infrastructure of the data center racks. There are two domains that Microsoft will assign a host to: fault and update. Every host is a member of both a fault domain and an update domain.
Fault domains define how the host gets networking and power. A host in fault domain 1 will not be linked to the same services as a host in fault domain 2, for example. In this way, a power outage should not bring down more than one fault domain. Similarly, update domains define when the underlying host OS gets updated. Hosts in update domain 1 will therefore not be patched at the same time as hosts in update domain 2. It should be noted that host downtime due to patching is almost never an issue anymore. Microsoft has made great improvements in the process of migrating VMs off hosts that will be updated to prevent any unforeseen downtime. They have gone so far as to remove the mention of update domains from their SLA documentation, but they are still there.
Microsoft provides a great visual for this concept in the below diagram.
Extending this down to the VM level, a single VM will also always belong to a fault domain and an update domain. Selecting, or even knowing, which domains a single VM belongs to is irrelevant, because there is nothing to guard against. If the host that a standalone VM is on loses power, there will be an outage.
Where this knowledge does become useful is when applications can be made available across hosts. When a web farm, for example, is stood up, there is always more than one VM – hence the term farm. Regardless of the number of servers in the farm, the desire is to always have at least one running. If a farm that consisted of 3 servers was deployed as standalone VMs in Azure, there would be no way to know which domains the individual VMs were on. This would subject them to potential simultaneous outages either through hardware failure or host updates. In addition, each of the VMs would be subject to the standalone SLA, e.g., 95% uptime if they were deployed with HDDs.
To account for this, Microsoft provides the ability to link VMs to an availability set. An availability set is a free resource that keeps track of the servers being deployed and ensures that they are spread out across multiple domains. The VMs in the above farm, for example, would be spread out across fault domain 1 and 2 and update domains 1, 2, and 3. In this configuration, at least one of the VMs should always be available. Using this deployment method, the collection of VMs attains a 99.95% uptime guarantee. To be clear, the individual VMs maintain their respective SLA, but the availability set grants a 99.95% guarantee that at least one VM in the set will be online. In addition, the 99.95% SLA is granted regardless of the disk type used. So, the farm SLA, which would be at 95% SLA with just standalone VMs, bumps up to 99.95% simply by deploying the same VMs in an availability set.
Keep in mind that networking resources, such as a load balancer, must also be deployed to allow for continued availability between the VMs in the availability set. The availability set only manages domain distribution, it does not provide any kind of network routing.
Availability sets are resources that need to be created, like the VMs that are linked to them. There are 2 types of availability sets, classic and aligned. Classic does not mean the availability set can only be used by classic (ASM) resources; it is tied to the type (not tier) of disk deployed. Managed disks, which are the recommended disk type, require an aligned availability set. Unmanaged disks (storage accounts) require the classic availability set. If an availability set is created ahead of the VMs and is a different type than the disk, it will not show up as an option at VM creation.
The only other options available when creating the availability set are how many instances of a domain the availability set should distribute across. The number of instances available will vary by region, but there will always be at least 2 fault domains and most regions offer up to 20 update domains.
The most important thing to understand about setting up VMs in an availability set is that it can only be done at VM creation. An existing VM cannot be added to or removed from an availability set. The VM would need to be deleted and recreated, which is not as bad as it sounds, because the existing disks can be attached to the new VM. It can be tedious, however, and it will result in downtime for the VM.
The next most important thing to know about availability sets is that they need to be deployed based on service grouping. For example, a web server and a database server should never be in the same availability set. This would guarantee that one of the servers would always be available, but not both. So, in the case of an outage, one server would drop, which would render the other server ineffective. In this instance, all the web servers should be in one availability set and all the database servers (for example a SQL Server AOAG) should be in a different availability set. This will ensure that at least one server from each service type is available.
It is clear that availability sets have a clear advantage over deploying single VMs, however the application becomes the limiting factor. There is one more bump in SLA that we can achieve – by taking advantage of availability zones. This will be the topic of the last post in this series.