Page321

Redundant Systems

Though quite a few fault-prone internal components can be configured to have redundancy built into systems, there is a limit to the internal redundancy. If system availability is extremely important, then it might be prudent to have entire systems available in the inventory to serve as a means to recover. While the time to recover might be greater, it is fairly common for organizations to have an SLA with their hardware manufacturers to be able to quickly procure replacement equipment in a timely fashion. If the recovery times are acceptable, then quick procurement options are likely to be far cheaper than having spare equipment on-hand for ad hoc system recovery.

High-Availability Clusters

Some applications and systems are so critical that they have more stringent uptime requirements than can be met by standby redundant systems, or spare hardware. These systems and applications typically require what is commonly referred to as a high-availability (HA) or failover cluster. A high-availability cluster employs multiple systems that are already installed, configured, and plugged in, such that if a failure causes one of the systems to fail then the other can be seamlessly leveraged to maintain the availability of the service or application being provided.

The actual implementation details of a high-availability cluster can vary quite a lot, but there are a few basic considerations that need to be understood. The primary implementation consideration for high-availability clusters is whether each node of a HA cluster is actively processing data in advance of a failure. This is known as an active-active configuration, and is commonly referred to as load balancing. Having systems in an active-active, or load balancing, configuration is typically costlier than having the systems in an active-passive, or hot standby, configuration in which the backup systems only begin processing when a failure state is detected.