Page337

Recovery Point Objective

The Recovery Point Objective (RPO) is the amount of data loss or system inaccessibility (measured in time) that an organization can withstand. “If you perform weekly backups, someone made a decision that your company could tolerate the loss of a week’s worth of data. If backups are performed on Saturday evenings and a system fails on Saturday afternoon, you have lost the entire week’s worth of data. This is the recovery point objective. In this case, the RPO is 1 week” [17].

RPOs are defined by specific actions that require users to obtain data access. For example, the RPO for the NASDAQ stock exchange would be: the point in time when users are allowed to execute a trade (the next available trading day).

This requires NASDAQ to always be available during recognized trading hours, no matter what. When there are no trades occurring on NASDAQ, the system can afford to be offline, but in the event of a major disruption, the recovery point objective would be when users require access in order to execute a trade. If users fail to receive access at that point, then the NASDAQ trading system will suffer a significant business impact that would negatively affect the NASDAQ organization.

The RPO represents the maximum acceptable amount of data/work loss for a given process because of a disaster or disruptive event.

Recovery Time Objective (RTO) and Work Recovery Time (WRT)

The Recovery Time Objective (RTO) describes the maximum time allowed to recover business or IT systems. RTO is also called the systems recovery time. This is one part of Maximum Tolerable Downtime: once the system is physically running, it must be configured.

Work Recovery Time (WRT) describes the time required to configure a recovered system. “Downtime consists of two elements, the systems recovery time and the work recovery time. Therefore, MTD = RTO + WRT” [17].

Mean Time Between Failures

Mean Time Between Failures (MTBF) quantifies how long a new or repaired system will run before failing. It is typically generated by a component vendor and is largely applicable to hardware as opposed to applications and software. A vendor selling LCD computer monitors may run 100 monitors 24 hours a day for 2 weeks and observe just one monitor failure. The vendor then extrapolates the following:

100 LCD computer monitors × 14 days × 24 hours/day = 1 failure/33,600 hours

This does not mean that one LCD computer monitor will be able to run for 3.8 years (33,600 hours) without failing [18]. Each monitor may fail at rates significantly different from this calculated mean (or average in this case). However, for planning purposes, we can assume that if we were running an office with 20 monitors, we can expect that one will fail about every 70 days. Once the vendor releases the MTBF, it is incumbent upon the BCP/DRP team to determine the correct amount of expected failures within the IT system during a course of time. Calculating the MTBF becomes less reliant when an organization uses fewer and fewer hardware assets. See the example below to see how to calculate the MTBF for 20 LCD computer monitors.

1 failure/33,600 hours = 20 LCD computer monitors × X days × 24 hours/day

Solve for X by dividing both sides of the equation by 20 × 24:

X days = 33,600/20 × 24
X days = 70

Mean Time to Repair (MTTR)

The Mean Time to Repair (MTTR) describes how long it will take to recover a specific failed system. It is the best estimate for reconstituting the IT system so that business continuity may occur.

Minimum Operating Requirements

Minimum Operating Requirements (MOR) describe the minimum environmental and connectivity requirements in order to operate computer equipment. It is important to determine and document what the MOR is for each IT-critical asset because, in the event of a disruptive event or disaster, proper analysis can be conducted quickly to determine if the IT assets will be able to function in the emergency environment.