What Is High Availability?
Something that concerns the use of most IT server systems is that they need to be available – when a client needs them – with a high degree of reliability. For most websites and internet services this generally means 24 hours a day, 7 days a week, 52 weeks a year, every year. There are many steps you can take to ensure that you meet this level of availability, and the techniques for doing so are referred to as high availability. The ultimate aim is to provide a system that should experience an absolute minimum of downtime.
One of the main causes of downtime are hardware failures. So one of the first tasks of building a highly available system is to remove the risk of a single hardware failure bringing down the system. This is normally achieved by running duplicates of all hardware systems in what is termed a cluster, allowing multiple servers to perform identical tasks. This means that if any one server were to fail, the overall system remains functional.
High Availability Clusters and Virtualization
With the growth of popularity in virtualization, this technology is often used in order to aid the creation of highly available systems. Allowing the software tasks to run on a virtual machine that can operate on any piece of hardware in the cluster has allowed for a great simplification of hardware clusters. This way all the hardware servers can be built up with identical software, and then if a hardware server dies the virtual machine can be moved between the hardware machines.
The problem with this is that it can be expensive to build and requires an amount of knowledge and skill to manage. Fortunately with VPS.net we already have the infrastructure built and managed for you with our Cloud VPS, meaning that all you need to do is build the virtual machines (also referred to as Virtual Private Server) that you need.
Software Considerations on Clusters
Unfortunately, having a virtual machine capable of being moved between hardware servers only solves the problem of hardware failures. There are also software problems that can render your system offline. While issues with hardware can often have simple solutions in as far as replacing the failed part, software issues can sometimes lead to more prolonged downtime while systems are corrected. A key example of a software failure would be corruption on the filesystem which can happen seemingly at random. Software can crash leaving your service offline until it is fixed and restarted, and software may also be maliciously damaged by hackers attacking the server. For this reason it becomes beneficial to run at least 2 identical virtual machines as this would allow for you to fail over from one to the other in the case that one has an issue.
Another benefit of running multiple virtual machines comes when you need to upgrade your software. You can take the time to safely upgrade one server without causing any issues for your customers. Then once it is complete, forcibly fail over to the upgraded one, meaning that your customers are now seeing the upgraded version allowing you to then upgrade the original primary server without causing any disruption.
This kind of solution does take some additional work over launching a simple pre-built VPS, but if you feel that you need this level of availability then the work investment is definitely worth it. Over the coming weeks we’ll be looking at how you can build a highly available pair of Virtual Private Servers running a Linux distribution to run your web service on one of our clouds, and have it automatically fail over should one of them go offline and also recover when it returns to service.