What would the failure of one or more critical applications
cost your business? That’s just the first of several important questions you need to answer to create a disaster recovery strategy
that gets your critical applications back online in the event of an
outage – and that keeps your company from becoming a scary
statistic in another analyst report.
;What is the likely cost of downtime
and data loss to my company?
The cost of downtime varies widely depending on the
industry you’re in and on the specific application(s) that go
offline. A media company, for example, might lose tens of
thousands of dollars for every hour its customer-facing website is unavailable, whereas an online brokerage might lose
millions of dollars for every hour its online trading application
The only way to accurately determine your cost of downtime is to conduct a business impact analysis and a risk
assessment for each business-critical application in your
The business impact analysis should quantify all the possible
costs resulting from downtime, including (but certainly not limited to):
u contractual penalties or lost bonuses, based on
service-level agreements (SLAs) or other contracts with
customers and partners
u lost customers or customer dissatisfaction
u damaged reputation
u loss or delay of new business opportunities
u increased expenses, such as overtime labor,
outsourcing, and travel costs
The business impact analysis lets you put a dollar cost on
every minute, hour, day, or week of downtime for each application so you can determine the specific amount of downtime your
business can tolerate.
A risk assessment examines each application’s vulnerability to
the various events that can cause downtime, from the mundane
(hard drive failures, hardware problems, Internet outages, data
connection issues, application crashes) to the cataclysmic (
malware or virus attacks, floods, tornadoes, earthquakes, blackouts,
terror attacks). Each type of event has its own associated recovery
measures and costs.
Together, the business impact analysis and the risk assessment
give you a clear, granular picture of the value of your applications
to your business – and help you determine and prioritize the elements of your disaster recovery strategy.
What are my recovery objectives?
Once you understand the potential costs of downtime and
the likelihood of particular threats, you can start to set recovery
objectives for each of your critical applications. There are three
types of objectives to you need to set:
;Recovery point objective (RPO). This answers the question, “How
many seconds, minutes, hours, and days of data from the point
of the outage can I afford to lose?” The answer depends upon the
application. If it’s a server that takes registrations for lead generation,
you may be able to tolerate a day or two of lost data based on what
you know about the cost of a lost lead. If it’s your e-commerce
system, a few minutes of data may be all you can stand. If you’re in
finance and you’re talking about a transactional system, you may not
be able to afford ANY data loss. A single lost transaction could cost
your business millions of dollars.
It follows that your RPO determines the frequency with which you’ll
need to replicate data from your production site to your disaster
recovery site. If your recovery point objective for an application is
one hour or less prior to disaster, you’ll need to replicate data at
least hourly. If you can’t afford to lose any data ever, you’ll need to
implement synchronous replication for that application. That is, you’ll
need to have your data written to the disaster recovery site at the
same time it’s being written at your production site.
;Recovery time objective (RTO). Your recovery time objective
answers the question, “How quickly do I need this application to be
back up and running after a disaster?” This answer is less about
data loss and more about having the application abilities available.
This will vary widely depending on the application and who uses it.
Your internal data warehouse may need to come back online in 12
hours; your customer-facing website may need to come back online
;Recovery capacity objective (RCO). This answers the question,
“how much computing capacity do I need to recover and by when?” It
depends on your collective RTOs, and as a result is usually phased
over time. For example, suppose your IT infrastructure has 50 virtual
machines (VMs) and 10 terabytes (TBs) of storage, and the whole
thing comes crashing down during a power outage or flood. At your
RTO of one hour, you may need 20 VMs and 5 TBs to support your
most critical application. You may be able to get along without your
entire capacity for another several days.
;What are my application and
Few applications are an island anymore; most draw on several other applications and services, within and sometimes
beyond your IT infrastructure. Your commerce application, for
example, might incorporate an authentication server, a product database, a database or inventory system from a partner or
supplier, accounting system, and more. You can’t successfully
recover this application without ensuring that the applications
on which it depends are running – or without recovering them,
if they’re not.