Understanding Avaya Aura Media Server Survivability Settings

My recent articles have explored how Port Networks and H.248 Media Gateways invoke the survivable modes of Avaya Aura Communication Manager (CM). In this article, I describe how the newest actor, Avaya Aura Media Server (AAMS) can also activate a CM-Survivable Core (fka Enterprise Survivable Server) and CM-Survivable Remote (fka Local Survivable Processor). In this article I will use the generic term CM-Survivable to reference both the Survivable Core and Survivable Remote servers.

If you follow Avaya’s announcements, then I’m sure you have heard that AAMS is one of the significant enhancements introduced in Avaya Aura 7.0. Actually, AAMS is a mature product that was created at Nortel circa 2003 to provide additional capabilities to their various communication servers. Back then, it was known as the Media Application Server. Since then, it has grown to provide a plethora of services, such as software-based DSPs, many leading voice and video CODECs, announcements, text-to-speech conversion, speech recognition, and DTMF detection.

Those abilities have been available to other formerly-Nortel products. Now, the abilities are also available to Communication Manager. Because AAMS has been around for a while, if CM is to use it AAMS must be release 7.7 or newer.

CM accesses the services of AAMS with SIP connections. You start by defining the AAMS as a “media server.” Interestingly, communication from CM to AAMS is defined within a SIP Signaling-Group, but lacks a corresponding Trunk-group. Further, the SIP communication is directly between the two devices and must not traverse a Session Manager. Similarly, AAMS needs to be configured to communicate with CM.

As described in other articles, a CM (either CM-Main or CM-Survivable) server becomes active whenever it controls DSP resources, which happens when either a H.248 Media Gateway (MG) or Port Network (PN) registers to the server. Because the AAMS contains DSP resources, it can also activate a CM server.

The first issue is determining which of up to 313 CM-Survivable (63 Survivable Core + 250 Survivable Remote) servers an AAMS could register to. That begins with an option on the third page of the change survivable-processor form called “Priority with respect to Media Servers.”

If no priority is assigned, then an AAMS cannot register to that CM-survivable and make it go active. However, this setting does not prevent a PN or MG from registering to this CM-Survivable server.

The next challenge is deciding exactly which priority to assign. This requires an analysis of your network topology, an estimation of what network failures are likely and/or most catastrophic, and a ranking of several survivability possibilities depending on how the network might fracture. That plan should drive the placement of resources such as AAMS, PNs, MGs and CM-Survivable servers. It would also suggest which priority to assign to each CM-survivable.

If your environment contains a mix of PN, MG and ASMS, you will want a failover strategy that causes as many as possible of them to register to the same CM-Survivable Processor. That administration needs to apply to your H.323 endpoints as well.

Assignable priorities start at 2 and go up to 9999. Since CM-Main is implicitly assigned priority of 1, it is obvious that larger integers mean lower priorities. By the way, priorities do not need to be assigned sequentially, allowing an administrator to deliberately leave numerical gaps that could be filled in later.

Only if a priority is assigned on page 3 of the change survivable-process form will you be able to populate the MEDIA SERVER REPORTING LIST on page 4. Effectively, this list identifies which of the potentially 250 AAMS servers could register to this CM-Survivable server.

Each AAMS needs to receive a list of all the CM-Survivable servers it might communicate with. So, CM-Main analyzes all the CM-Survivable entries and compiles a list per AAMS. I speculate that as part of its reporting mechanism, CM-Main then provides each AAMS with its custom list of CM-survivable servers and their assigned priorities.

03-29-16 Image 1 Understanding Avaya Aura Media Survivability Settings
03-29-16 Image 2 Understanding Avaya Aura Media Survivability Settings

Next, we need a heartbeat mechanism for AAMS to learn when CM-Main has become unavailable. AAMS periodically sends a status “report” to CM-Main that CM must promptly acknowledge. The Report Interval (RI) determines the frequency of this “heartbeat” (default 60 seconds). The Report Expiration (RE) timer (default 180 seconds) determines how long AAMS will wait for a response from CM-main.

If the Report Expiration timer expires, the AAMS will look to its list of assigned CM-Survivable servers. It will then work its way down the list, sending status reports to each CM-survivable until one responds. The available documentation suggests that each AAMS simultaneously sends reports to all its configured CMs (CM-Main and all assigned CM-Survivable servers). When a CM-Survivable receives a report from AAMS telling it that CM-Main is down, it is effectively a registration that activates CM-Survivable.

03-29-16 Image 3 Understanding Avaya Aura Media Survivability Settings

If you assign the same priority (for example ‘3’) to two or more CM-Survivable servers, then you need to make sure that each one has unique AAMS assigned to it on the Media Server Reporting List. In other words, no AAMS can be assigned a list of CM-Survivable servers with duplicated priorities.

In a different article, I discussed how the Split Registration Prevention feature (SRPF) works with MGs. I was surprised to learn that it works the same way with AAMS devices.

Fallback to CM-main can be invoked automatically when the ms-recovery-rule threshold is met (i.e. as soon as possible, or at a particular day and time). Alternatively, failback can be invoked manually from CM-Main with the command: enable ms-return.

Another implication of the introduction of AAMS is that it modifies a technical distinction between CM-Survivable Core (SC) and CM-Survivable Remote (SR). Previously either a PN or a MG could register to Survivable Core, but only a MG could register to a Survivable Remote. An AAMS can register to either a Survivable Core or a Survivable Remote. In other words, now the distinction between the two types of survivable servers is simply that PN cannot register to a Survivable Remote server.

With the addition of AAMS in Aura 7, Avaya has introduced some fantastic features. It also added flexibility to the survivability strategies that can be applied to CM.

Related Articles:

Zang Serves Up a Special Delivery for Your Mom this Mother’s Day

Mother’s Day is the one day in the U.S. when the most phone calls are made. According to this cool Mother’s Day Facts site, 122 million calls are made to mothers on Mother’s Day in the United States alone. Considering there are only 85 million mothers in the U.S., Mom must be pretty busy taking calls from her multiple children, and Dad must be busy making reservations at the favorite family restaurant (Mother’s Day remains the top holiday for dining out).

To help make sure Mom gets that special call on Mother’s Day, Zang today announced a Zang-built service for those who 1) are multiple time zones away from mom (ie: military, working or studying abroad), 2) just want to send another thoughtful gift to Mom to let her know she’s loved, or 3) frankly, for those who have a track record for forgetting (you know who you are). With the Zang Forget Me Not service, anyone can record a voicemail for their mom before Mother’s Day, designate the date & time the voicemail should be sent, then receive a text confirming the voicemail was delivered. The new service was created using  cloud-based Zang Comms platform as a service, which allows anyone to create communication applications and services just like Forget Me Not.

How does it work, you ask? Simple. First go to www.zang.io/callmom and complete four short steps:

1)  Enter your telephone phone number
2)  Enter recipient’s telephone number
3)  Pick the time you would like the recording to be delivered
4)  Zang Forget Me Not service will then call your phone number for you to record, review and approve your message for delivery.


Go ahead—give it a try! It’s just one more surprise you can give Mom this Mother’s Day.

Next time you visit Dubai, take a public transport

With happiness being a key focus in Dubai, government agencies are looking towards contributing to the goal of raising the quality of life of customers and ensuring public happiness. These agencies are quickly realizing that the key to delivering a better and more personalized experience is technology. Using the latest services and solutions paves the way to guaranteed customer retention and loyalty.

One of the leading organizations in the area of customer care, winning multiple awards for its contact centre operations including a Hamdan bin Mohammed Smart Government Award, is the Roads & Transport Authority (RTA).

The RTA has a wide remit including Dubai’s Metro, public buses, private road vehicle registration, traffic management and more, so it has a diverse customer base negotiating Dubai’s busy transport system, with a volume of customer enquiries to match. It therefore comes as no surprise that the RTA is investing in multiple channels of communications with its customers, to improve standards of service, increase efficiency and gain valuable feedback from its user. It is also looking to technology to help improve the quality of interactions with clients and to improve overall levels of customer satisfaction and engagement. It has utilized a number of different solutions to increase its outreach to customers, and over time the focus of these efforts has evolved, to include voice communications, smart apps and multi-channel engagement.

From projects and operational perspective, RTA has a big focus on alternative smart channels. It offers 173 smart services under nine apps, that can help customers complete their transactions with a click of the finger through the automation of the main services the authority provides. It is dedicated to opening up more channels of communication, with an omni-channel strategy, that includes delivering services through channels such as self-service kiosks. At present the RTA has deployed around 16 kiosks, which offer smart services to users in RTA service centres, and in future it plans to have around 100 kiosks all over the city. The Authority has a well-established customer care line, which handles enquiries across the range of its activities, running on Avaya contact centre solutions. In 2015, the centre handled over 2.5 million calls, with over 80% of calls responded to in 20 seconds, and 90% of issues resolved in one call.

To make this possible, last year the contact centre underwent a major technology refresh, to put in place the latest generation of solutions. With Avaya Aura, RTA is now using the most recent software to increase the efficiency of the contact centre. With the aim to deliver the best possible interaction experience to transport customers, Avaya aligned with RTA’s Customer Resource Management strategy to consolidate channels and mediums into RTA’s first, best-in-class contact center to host multi-channel interactions. Among the capabilities that the new technology has enabled is an advanced Interactive Voice Response (IVR) system, which has helped to improve operations by automatically handling some of the more common customer enquiries. On New Year’s Eve the centre received some 12,000 calls, with the IVR handling one third of all enquiries.

The RTA is a pioneering example of how technology can make the difference in delivering quality to customers through the creation of a seamless and hassle free experience. As we share the RTA’s vision in excelling in customer experiences to achieve happiness, my advice to you is that, next time you visit Dubai, remember to take a  public transport.

How to Prevent Media Gateway Split Registrations

Back when Avaya Aura Communication Manager 5.2 was released, I recall reading about this new capability called Split Registration Prevention Feature (SRPF). Although I studied the documentation, it wasn’t until I read Timothy Kaye’s presentation (Session 717: SIP and Business Continuity Considerations: Optimizing Avaya Aura SIP Trunk Configurations Using PE) from the 2014 IAUG convention in Dallas that I fully understood its implications.

What is a Split Registration?

First I need to explain what SRPF is all about. Imagine a fairly large branch office that has two or more H.248 Media Gateways (MG), all within the same Network Region (NR). SRPF only works for MGs within a NR and provides no benefit to MGs assigned to different NRs.

Further, imagine that the MGs provide slightly different services. For example, one MG might provide local trunks to the PSTN, and another might provide Media Module connections to analog phones. For this discussion, it does not matter what type of phones (i.e. SIP, H.323, BRI, DCP, or Analog) exist within this Network Region. During a “sunny day,” all the MGs are registered to Processor Ethernet in the CM-Main, which is in a different NR somewhere else in the network. It aids understanding if you believe that all the resources needed for calls within a NR are provided by equipment within that NR.

A “rainy day” is when CM-Main becomes unavailable, perhaps due to a power outage. When a MG’s Primary Search Timer expires, it will start working down the list trying to register with any CM configured on the Media Gateway Controller (MGC) list. All MGs should have been configured to register to the same CM-Survivable server, which by virtue of their registration to it causes CM-Survivable to become active.

Image 1

In this context a CM server is “active” if it controls one or MGs. A more technical definition is that a CM becomes “active” when it controls DSP resources, which only happens if a MG, Port Network (PN) or Avaya Aura Media Server (AAMS) registers to the CM server.

Since all the MGs are registered to the same CM, all resources (e.g. trunks, announcements, etc.) are available to all calls. In effect, the “rainy day” system behaves the same as the “sunny day” with the exception of which CM is performing the call processing. Even if power is restored, only the CM-Survivable is active, and because no MGs are registered to CM-Main it is inactive.

In CM 5.2, SPRF was originally designed to work with splits between CM-Main and Survivable Remote (fka Local Survivable Processor) servers. In CM 6, the feature was extended to work with Survivable Core (fka Enterprise Survivable Servers) servers. To treat the two servers interchangeably, I use the generalized term “CM-Survivable.”

A “Split Registration” is where within a Network Region some of the MGs are registered to CM-Main and some are registered to a CM-Survivable. In this case only some of the resources are available to some of the phones. Specifically, the resources provided by the MGs registered to CM-Main are not available to phones controlled by CM-Survivable, and vice versa. In my example above, it is likely some of the phones within the branch office would not have access to the local trunks.

Further, the Avaya Session Managers (ASM) would discover CM-Survivable is active. They would learn of CM-Survivable server’s new status when either ASM or CM sent a SIP OPTIONS request to the other. The ASMs then might begin inappropriately routing calls to both CM-Main and CM-Survivable. Consequently, a split registration is even more disruptive than the simple failover to a survivable CM.

What can cause split registrations? One scenario is when the “rainy day” is caused by a partial network failure. In this case some MGs, but not all, maintain their connectivity with CM-Main while the others register to CM-Survivable. Another scenario could be that all MGs failover to CM-Survivable, but then after connectivity to CM-Main has been restored some of the MGs are reset. Those MGs would then register to CM-Main.

How SRPF Functions

If the Split Registration Prevention Feature is enabled, effectively what CM-Main does is to un-register and/or reject registrations by all MGs in the NRs that have registered to CM-Survivable. In other words, it pushes the MGs to register to CM-Survivable. Thus, there is no longer a split registration.

When I learned that, my first question was how does CM-Main know that MGs have registered to CM-Survivable? The answer is that all CM-Survivable servers are constantly trying to register with CM-Main. If a CM-Survivable server is processing calls, then when it registers to CM-Main it announces that it is active. Thus, once connectivity to CM-Main is restored, CM-Main learns which CM-survivable servers are active. This is an important requirement. If CM-Main and CM-Survivable cannot communicate with each other a split registration could still occur.

My second question was how CM forces the MGs back to the CM-Survivable. What I learned was that CM-Main looks up all the NRs for which that Survivable server is administered. The list is administered under the IP network region’s “BACKUP SERVERS” heading. CM-Main then disables the NRs registered to CM-Survivable. That both blocks new registrations and terminates existing registrations of MGs and H.323 endpoints.

Image 2

Once the network issues have been fixed, with SRPF there are only manual ways to force MGs and H.323 endpoints to failback to CM-Main. One fix would be to log into CM-Survivable and disable the NRs. Another would be to disable PROCR on CM-Survivable. An even better solution is to reboot the CM-Survivable server because then you don’t have to remember to come back to it in order to enable NRs and/or PROCR.

Implications of SRPF

Enabling SRPF has some big implications to an enterprise’s survivability design. The first limitation is that within an NR the MGC of all MGs must be limited to two entries. The first entry is Processor Ethernet of CM-Main, and the second the PE of a particular CM-Survivable. In other words, for any NR there can only be one survivable server.

Similarly, all H.323 phones within the NR must be similarly configured with an Alternate Gatekeeper List (AGL) of just one CM-Survivable. The endpoints get that list from the NR’s “Backup Servers” list (pictured above). This also means the administrator must ensure that for each NR all the MGs’ controller lists match the endpoints’ AGL.

Almost always, if SRPF is enabled, Media Gateway Recovery Rules should not be used. However in some configurations enabling both might be desirable. In this case, all MGs must be using an mg-recovery rule with the “Migrate H.248 MG to primary:” field set to “immediately” when the “Minimum time of network stability” is met (default is 3 minutes). Be very careful when enabling both features because there is a danger that in certain circumstances both the SRPF and Recovery Rule will effectively negate each other.

Finally, SPRF only works with H.248 MGs. Port Networks (PN) do not have a recovery mechanism like SRPF to assist in rogue PN behavior.

Enabling SRPF

The Split Registration Prevention Feature (Force Phones and Gateways to Active Survivable Servers?) is enabled globally on the CM form: change system-parameters ip-options.

Image 3

If I had not found Tim Kaye’s presentation, I would not have completely understood SRPF. So, now whenever I come across a presentation or document authored by him, I pay very close attention. He always provides insightful information.