maximum allowable downtime does not arbitrarily shorten as a
result of an increased capability resilience level.
For a HRC, RTO has nothing to do with “sustaining” the
user’s experience, because the capability’s remaining production
sites continue to provide service. Criticality, resilience, and feasibility drive the urgency of RTO. RTO feasibility derives from
several factors, such as the distance between production and
recovery sites, the cost of the recovery solution, and whether the
disaster occurs during business hours (and the staff is required
or prepared to respond in off-hours).
Relationships
The alignment of RTO to MAD depends on the capability’s
resilience level. The following two diagrams display the general
relationship between downtime, resilience and recovery, for both
technology and functions.
This article’s underlying assumption for evaluating resiliency
is a single site outage. Large scale disasters could affect multiple
production sites at same time, but a single site framework serves
as a reasonable baseline.
On the technology diagram, the vertical axis shows maximum allowable downtime, with a shortest possible value of
zero and a logarithmic scale. The horizontal axis shows resilience level, beginning with Level 1 and increasing to the right.
The diagonal red lines show recovery time objective on a relative “shorter-longer” scale (not necessarily equaling the values
on the MAD logarithmic scale). The RTO could approach zero
at technology resilience Level 2. For resilience Level 1, RTO
must increase, and the dashed lines indicate typical RTOs
measured between hours and days. As the resilience level
increases, the RTO potentially can lengthen considerably. The
dollar signs show relatively higher levels of expense which
often are associated with shorter RTOs, and the inverse relationship. The gray “uncertainty triangle” on the lower left side
of the diagram illustrates the inherent conflict and increased
risk which arise in the combination of a very low MAD and
Resilience Level 1.
For the function relationship diagram, the basic structure
resembles the first diagram. The position of the RTO lines
displays the main difference. The dashed red RTO line begins
at a higher point on the vertical scale, indicating the significant
difficulty associated with relocating (in less than several hours)
the people who perform a function. The dashed line also shows
there is no absolute lower boundary to RTO. The solid RTO line
suggests that complete recovery of larger groups of people could
require days at the lower end of the resilience scale. There are
definite lower physical limitations to RTO boundaries which are
associated with relocating staff. As before, higher relative cost
attaches to shorter RTOs. In general, the higher boundary of
RTO could decrease gradually as the resilience level increases,
because of the reduction in the proportion of the staff which
must relocate when a single site is affected.
Examples