Preventative Maintenance
A highly-structured and robust maintenance program is crucial in preventing
a disaster from impacting business. You
should have a CMMS (computer maintenance management system) to keep
track of when maintenance is due as
well as repairs that have been done. This
also helps to identify equipment that has
numerous repairs and may require replacement before it reaches end of life. Only
through a regularly scheduled preventative maintenance program performed by
OEM representatives can you be assured
that a data center is prepared for a disaster.
Batteries are a weak point in any system
and, if not monitored and maintained
properly, they can actually cause an outage
during a loss of utility power. Real-time
monitoring can help by not only reporting when the batteries fall out of the OEM
specifications, but also by performing load
testing to ensure the UPS can support the
critical load. Standard quarterly maintenance on batteries isn’t always enough.
Batteries can and often do fail shortly after
scheduled preventative maintenance.
The way in which maintenance on critical equipment is planned and executed is
also extremely important. For example, a
critical environment work authorization
program ensures that each element in the
maintenance procedure is reviewed not
only by the local facilities engineering
team but also by a committee consisting of
engineering staff and management across
the enterprise. Maintenance on critical equipment should only be performed
when there is 100 percent confidence in
the method of procedure, the contractors performing the work, and the documented contingency plans. Review your
maintenance records, including associated repairs, to ensure your confidence in
your ability to prevent and predict needed
maintenance.
Predictive maintenance is as important
as preventive measures regarding end-of-life equipment replacement decisions.
Once again, batteries – specifically their
timely replacement – are a perfect example. In this case, you should consider the
age of the UPS batteries, what the OEM
recommended life expectancy is, and
when you will want to replace them. Even
well-maintained equipment will eventually reach an end of life cycle, which could
lead to a catastrophic failure if there is not
proper predictive planning for replacement.
For IP network maintenance, you need
to work closely with your network provider to gain an understanding of how long
their equipment has been in service, when
the last failure was, and how recently its
software has been updated. How do they
monitor the health of the devices? Do they
monitor device logs proactively or primarily react to events that occur? Do they