How to Avoid These 5 Key Solution Outages

The New Year brings new opportunities to look back, and–more importantly–look forward. In 2015, many IT managers will resolve to keep their systems up and running 24/7 without fail. And no wonder: An average service outage can result in $385,000 in related revenue loss.

The costs are significant, but the five causes of communications outages are relatively simple.

According to an analysis of our customers in a recently published white paper, the top five causes of communications outages—and the percentage of those outages that could potentially have been prevented had best practices been followed—are:

  1. Power outage: 81%
  2. Lack of routine maintenance: 78%
  3. Hardware failure: 52%
  4. Software bug or corruption: 34%
  5. Network issue or outage: 27%

“Nearly two-thirds of outages resulting from the top five causes, and more than a third of all outages, could have been avoided by using industry-leading outage prevention practices,” writes Joey Fister, Senior Director of Emergency Recovery and CALA Technical Support Services.

In this white paper, Avoiding Outages: Preventive Steps to Avert Five Key Causes, Fister’s insights can help companies avoid revenue loss and productivity-destroying downtime.

The full white paper is available on avaya.com.

Below are excerpts, including the industry-leading practices and tools that organizations can use to reduce the potential for the top outages we have seen most prevalent in the Avaya customer base:

  1. Power outages. Uninterruptible power supply (UPS) units are essential to keep systems operating through lightning strikes, storms and other power disruptions.But are they adequate?UPS arrays should meet the specifications of the communications and networking systems they support, of course. But as organizations grow, so does the mix of gear relying on UPS systems. Adequate UPS systems, as well as proper grounding of sensitive equipment, are crucial.

    Audits can help determine if facilities can meet power demands and ward off problems. Your service provider should be able to provide the framework for periodic audits or even help you conduct them. Particular attention can be given to hardware that is approaching the end of manufacturer support (EoMS).

  2. Lack of routine maintenance. Just as people understand that a poor diet, lack of exercise and parking in certain activities can worsen their health, most organizations know that poorly-tended systems can fail from lack of proper care.Yet, the high percentage of remediable outages (78%) attributed to poor maintenance suggests organizations are underutilizing upkeep—one of the best ways to maintain system uptime.Most equipment emits telltale signs when a problem is approaching. Proactive health checks, disciplined system monitoring and observed maintenance schedules can aid in hearing the signal, helping improve the reliability of communications assets.
  3. Hardware failures. Old equipment may chug along today, but it won’t forever. Continued use of those “sweated assets” is an increasingly risky gamble with major consequences should they go bust.If replacement parts or equipment are not available immediately when they fail, the length of the resulting outage can be extended significantly while replacements are located and acquired.An organization needn’t upgrade everything, though. Proactive upgrades of equipment approaching EoMS, audits to verify system redundancy, system health checks, and failover strategies for critical systems can help reduce hardware-based outages.
  4. Software bugs or corruption. While software vendors constantly release fixes and upgrades into the marketplace, not all organizations are eager to apply them. Some choose to let others occupy the upgrade frontlines and endure potential rollout hiccups, then follow along at a safe interval.This strategy breaks down disastrously when an organization suffers an outage that would have been avoided with a fix that it voluntarily chose to postpone. A sound patching strategy, and proactive patching to eliminate known issues, can help maintain software performance and avoid software-related outages.
  5. Network issues or outages. Jitter, delay and latency can be warnings of a possible network outage. In some cases, a simple audit of an organization’s underlying network can identify where such conditions exist.A network diagram can prove indispensable in isolating an outage, speeding resolution by illustrating the relationships among pieces of equipment. And rigorous configuration control processes can help ensure that system changes and refinements do not inadvertently trigger outages and other problems.

What best practices do you use to avoid big solution outages?
Follow me on twitter @Pat_Patterson_V