How to Avoid These 5 Key Solution Outages

The New Year brings new opportunities to look back, and–more importantly–look forward. In 2015, many IT managers will resolve to keep their systems up and running 24/7 without fail. And no wonder: An average service outage can result in $385,000 in related revenue loss.

The costs are significant, but the five causes of communications outages are relatively simple.

According to an analysis of our customers in a recently published white paper, the top five causes of communications outages—and the percentage of those outages that could potentially have been prevented had best practices been followed—are:

  1. Power outage: 81%
  2. Lack of routine maintenance: 78%
  3. Hardware failure: 52%
  4. Software bug or corruption: 34%
  5. Network issue or outage: 27%

“Nearly two-thirds of outages resulting from the top five causes, and more than a third of all outages, could have been avoided by using industry-leading outage prevention practices,” writes Joey Fister, Senior Director of Emergency Recovery and CALA Technical Support Services.

In this white paper, Avoiding Outages: Preventive Steps to Avert Five Key Causes, Fister’s insights can help companies avoid revenue loss and productivity-destroying downtime.

The full white paper is available on

Below are excerpts, including the industry-leading practices and tools that organizations can use to reduce the potential for the top outages we have seen most prevalent in the Avaya customer base:

  1. Power outages. Uninterruptible power supply (UPS) units are essential to keep systems operating through lightning strikes, storms and other power disruptions.But are they adequate?UPS arrays should meet the specifications of the communications and networking systems they support, of course. But as organizations grow, so does the mix of gear relying on UPS systems. Adequate UPS systems, as well as proper grounding of sensitive equipment, are crucial.

    Audits can help determine if facilities can meet power demands and ward off problems. Your service provider should be able to provide the framework for periodic audits or even help you conduct them. Particular attention can be given to hardware that is approaching the end of manufacturer support (EoMS).

  2. Lack of routine maintenance. Just as people understand that a poor diet, lack of exercise and parking in certain activities can worsen their health, most organizations know that poorly-tended systems can fail from lack of proper care.Yet, the high percentage of remediable outages (78%) attributed to poor maintenance suggests organizations are underutilizing upkeep—one of the best ways to maintain system uptime.Most equipment emits telltale signs when a problem is approaching. Proactive health checks, disciplined system monitoring and observed maintenance schedules can aid in hearing the signal, helping improve the reliability of communications assets.
  3. Hardware failures. Old equipment may chug along today, but it won’t forever. Continued use of those “sweated assets” is an increasingly risky gamble with major consequences should they go bust.If replacement parts or equipment are not available immediately when they fail, the length of the resulting outage can be extended significantly while replacements are located and acquired.An organization needn’t upgrade everything, though. Proactive upgrades of equipment approaching EoMS, audits to verify system redundancy, system health checks, and failover strategies for critical systems can help reduce hardware-based outages.
  4. Software bugs or corruption. While software vendors constantly release fixes and upgrades into the marketplace, not all organizations are eager to apply them. Some choose to let others occupy the upgrade frontlines and endure potential rollout hiccups, then follow along at a safe interval.This strategy breaks down disastrously when an organization suffers an outage that would have been avoided with a fix that it voluntarily chose to postpone. A sound patching strategy, and proactive patching to eliminate known issues, can help maintain software performance and avoid software-related outages.
  5. Network issues or outages. Jitter, delay and latency can be warnings of a possible network outage. In some cases, a simple audit of an organization’s underlying network can identify where such conditions exist.A network diagram can prove indispensable in isolating an outage, speeding resolution by illustrating the relationships among pieces of equipment. And rigorous configuration control processes can help ensure that system changes and refinements do not inadvertently trigger outages and other problems.

What best practices do you use to avoid big solution outages?
Follow me on twitter @Pat_Patterson_V

Related Articles:

What Activates a Survivable Avaya Communication Manager Server?

I’ve been teaching Avaya Aura Communication Manager (CM) for about 8 years now. When the topic of CM’s survivability strategy comes up, I like to ask students, “What causes a CM-Survivable Core or a CM-Survivable Remote server to become active?” Typically, students respond with, “the CM-Survivable Server takes control when it loses communication with CM-Main.”

I’m always surprised by how many experienced engineers give that wrong answer.

We all agree that CM-Survivable Core (CM-SC) and CM-Survivable Remote (CM-SR) are doing essentially nothing during a “sunny day,” when CM-Main is in full control of the environment. More specifically, no call processing is occurring within them.

To know what is healthy, Avaya depends upon “heartbeats” between various devices. Those “heartbeats” are called many things, such as keep-alives, or sanity checkslots. For the purpose of this discussion, heartbeats are exchanged between CM-Main and survivable CMs, H.248 Media Gateways (MG), IPSI (Internet Protocol Server Interface) circuit packs within Port Networks, H.323 phones, and starting in CM version 7.0, Avaya Aura Media Servers (AAMS). Further, for some devices, CM is responsible for generating the heartbeats, and for other devices, the device itself generates the heartbeats.

First, all of these devices must have registered with CM-Main. This “gatekeeper” function requires that the device present some credentials to ensure its legitimacy, and then CM needs to be told, or to learn, the IP address of that device.

In older CM versions, those devices (except for IPSIs) could register with any of up to 106 CLAN (control local area network) circuit packs. Acting as a front-end processor, the CLAN would collect the registration requests and forward them to CM. Recently, Avaya deprecated the use of CLANs, and wants all devices to register directly to CM at the Processor Ethernet (also known as PROCR) IP address.

So now imagine that CM-Main can no longer communicate with some, or all of these other devices. Perhaps a power outage took CM-Main completely offline, or maybe some network issue separated CM-Main from one or more of these devices. In this “rainy day” scenario, the heartbeats are not being exchanged.

In a well-designed system, each of these devices has a list of alternate CM “gatekeeper” addresses that it could register to. Further, there are several administrable fields that determine how frequently heartbeats are sent, how many heartbeats need to go missing before the device recognizes it has lost communication with CM-Main, and how long the device should wait before registering to another CM.

Most of those settings should be provided to all CM-survivable servers every night when CM-Main automatically performs a ‘save translations.’

With all that as background, here is the answer to my question: It is the Media Gateway, IPSI or AAMS that decides to register to a CM-Survivable. The survivable CMs are passive. Upon registration by one of these three devices, CM-Survivable becomes active. In other words, the survivable CM did not “take control.”

So from this perspective, what is the difference between CM-Survivable Core (SC), (formerly known as an Enterprise Survivable Server) and CM-Survivable Remote (SR) (formerly known as Local Survivable Processor)? A CM-SC is activated by the registration by a MG, an IPSI, or an AAMS. However, a CM-SR can only be activated by the registration of a MG. IPSIs and AAMS’ cannot register to a CM-SR.

“But no wait, John” some students have said. “Registration by an H.323 phone can also cause a CM survivable to become active.” Unfortunately, no. CM will not let an H.323 phone register if CM cannot provide it with a VoIP resource. And those VoIP resources reside within the MGs, Port Networks, and AAMS’. So, only after one of those devices registers to a CM could that CM allow an H.323 phone to register.

To focus on the point that registration by an IPSI, MG or AAMS is what activates a CM-survivable server, I kept this article vague. In future articles, I’ll provide details about how the IPSI, MG and AAMS know when and where to register. I’ll also discuss handling split registration.

The 5 Reasons Why I Joined Avaya Global Support Services

A couple months ago, I found myself around the family kitchen table attempting to explain to my wife why I was considering leaving my current role at Avaya for another. Between requests and “stories” from our three young children, I explained that I was considering joining the leadership team of our support organization as a strategy leader.

Carl Knerr

As I cut some food for my daughter and passed the ketchup to the boys (is there such a thing as too much ketchup?), I told my wife about how Avaya Support has gone through a huge evolution in the last couple of years, earned a lot of recognition, and are now looking to take their delivery to another level.

Although I would very much enjoy having you, oh beloved reader, at our family dinner table, alas, the logistics just do not allow for such wonderful things. While I cannot share some of my wife’s amazing cooking, I will share with you the reasons why I think Avaya Support will continue to be not only an industry-leader, but an innovator in delivering value to our clients.

Downtime is the enemy

91 Percent of Outages Resolved in 2 Hours or Less

We know that system outages are the worst thing that can happen to our clients, and as such, we have dedicated teams trained on restoring down systems as fast as possible. If that isn’t great enough, I’m really excited that they have begun to reach out to our clients when we know they are at risk.

For example, we know that having a recent backup allows restore times in 2.5 hours instead of 1.5 days. Given the industry average of $110,000 of cost to the customer for every hour of outage, this makes a huge difference to our clients. If a client doesn’t have a recent backup, we reach out and help them implement a backup strategy. These engineers are inspiring on how customer focused they are.

Death to Rework

Our support team hates rework and that’s why every time one of our wicked-smart engineers finds a new problem, she will document it and publish the solution to instantly. As if that isn’t enough, we use our own Avaya technology, Avaya Automated Chat, to help our customers easily find what they are looking for.

Avaya Ava

In fact, our implementation of this technology, dubbed “Ava”, has become the face of Avaya. How cool is that? Even when Ava fails to find you a solution, she succeeds by putting you in touch with an Avaya engineer and passing that engineer your full history, so that you don’t have to start all over.

Not Just a Phone Company

As a Gen-X’er myself, I despise talking to customer support on the phone; and Avaya gets that. Not only do we work with customers over the phone, email, or online chat, but last year, they deployed a first-in-the-industry video chat option using our own products.

If you haven’t tried this yet, stop what you’re doing; and just check it out. Wicked cool stuff. Of course the hot topic is support via social media–something I’ve written on–and now you have an opportunity to see what @Avaya_Support can start doing.

Innovative Diagnostics

Perhaps my most passionate topic over the years has been around diagnostics for Avaya products.

Avaya Support continues to raise the bar in the space of diagnostics. I get irritated when I see valuable time of our human experts being used to validate basic settings, gather log files, etc.

We’ve got really exciting technology that leverages our lessons learned from years of troubleshooting hundreds of thousands of customer systems and we embed that into tools that can solve product issues without an engineer; which means our customer gets an issue resolved in a matter of minutes.

I’m excited that we’re not resting on those laurels, but continuing to invest in improvements and all-new tools to keep satisfying our customers.

Satisfied Customers

As a result, in 2014, 92% of our clients indicated that their overall support experience was excellent, very good, or good. Read that last sentence again. 92%! Isn’t that amazing!

When we look at Avaya’s Net Promoter Score, as an entire company, we were at 65 the last quarter (average of 50 over the last 4 quarters), putting us in best-in-class with Amazon and other companies and beating out companies like Cisco, Microsoft, and Shoretel. Read more here.

Avaya Client Services

Please don’t just take my word for the impact of the items above. In October 2014, I was proud to join other Avaya Client Services leaders in Las Vegas at the TSW 2014 Conference, hosted by TSIA. Avaya walked away with three awards for our efforts in Avaya Client Services, putting Avaya in TSIA’s STAR Awards Hall Of Fame.

As you may surmise from the above, I was convinced this was the right move for me and I’ve made the shift. As part of this new role, I plan on continuing to bring you stories from Avaya Support as a means to help our customers and partners derive as much value as possible out of their Avaya Support agreements.

*Based on internal metrics in 2014

Four Business Ideas your Organization Should Consider in 2014

Lightbulb Idea

As 2013 draws to a close, many organizations are reviewing their results for the year and planning for 2014. With sectors of the global economy recovering at varying paces, companies are looking to see how to be competitive and productive at the same time.

Here are four areas organizations should be looking at, from my point of view:

#1: Learn the best lessons from other industries, not just your own

Often, organizations look to peers and competitors in the same industry for benchmarking financial and operational performance. Don’t forget that best practices and innovation can cross over from one industry to the next. For example, Velcro and Teflon were both invented by NASA, but found uses in home improvement and cookware. Also consider:

• If you are in banking and insurance, practices and technologies in retail and e-commerce are often applicable since you are targeting the same consumer. Think about how retail e-commerce changed the face of online banking.
• If you are in the healthcare business, part of the hospital operations model is similar to the lodging business, in terms of customer service and resource management. Think about check-in, room management and physical security processes.
• The technology infrastructure support of large sports venues like the Sochi Olympics is applicable to large education or manufacturing campuses, or small cities. Think about ubiquitous wireless access, peak load management and backup systems.

#2: Forget about showrooming or webrooming – It’s all one big customer experience channel

In the ongoing debate about channels, showrooming (browsing in stores and buying online) and webrooming (browsing online to buy in stores) are constantly in the news, as experts debate which channel will usurp another.

In reality, today’s organizations and consumers are all adapted to communicate and transact to ALL channels depending on the scenario. Whether it is email, text message, video conferencing, website self-service, in-person or voice communication, organizations need to be able to communicate and transact by varying degrees across ALL channels.

The key to success now is to maintain a cohesive view of the customer and deliver consistent brand experience across all channels that take advantage of the benefits of each channel.

#3: The next generation of employees is not like their predecessors

Numerous books and studies have been written about millennials, Generation Me, and children today being raised on technology. I have seen children playing with tablets while still in strollers. I am expecting solar chargers for child strollers coming soon.

Fisher Price Apptivity SeatPhoto: Fisher Price’s Newborn-to-Toddler Apptivity Seat

Employers will need to adapt to these workers, who often own more advanced technology than what corporate IT departments are issuing. Their expectations on connectivity and response time will challenge traditional IT processes and investment models.

To keep the employees motivated and productive, IT needs to drive value and innovations to support the business issues employees are addressing, rather than simply being the gatekeeper of standards and efficiency.

#4: Watch for big disruptions driven by society, demographics and technologies

Finally, as you look into 2014, don’t forget to check for disruptions that can change the nature of your industry.
10 years ago, the idea of mobile communications, personal video conferencing, and augmented reality were still mostly the stuff of Star Trek. Venerable video rental, book store and film brands found their business models turned upside down.

As new trends and technologies are available to consumers and businesses alike, think about the impact to your business and how to avoid being obsoleted. Better yet, learn to take advantage of these changes to grow your business.

I hope you all have a great 2014 and I look forward to reading your comments or suggestions on other topics I should explore in the future.