Writing Your First WebRTC Application: Part 1

This is not the article that I intended to post today. The real article, which is still a work in progress, is a step-by-step approach to writing your first WebRTC application. It’s chock-full of JavaScript, WebRTC function calls, and HTML code.

However, it’s proving to be much bigger and more complicated than I expected. It’s not a simple task to take the reader from square one all the way across the board and not require 30 or more pages to get him or her there.

So, like all good elephant feasts, I’ve decided to break this into smaller pieces and spend some quality time on each one of them. My goal is that when all is said and done, you will be able to string the pieces together to get a much better understanding than if I just threw them at you all at once.

Which brings me to Part One. I’ll try not to get too technical (which is a difficult task given the nature of this subject) and present the high-level concepts of writing a WebRTC application. For some of you, this will be enough. For others, it will leave you wanting more.

The Beginning of the Beginning
A WebRTC application can be divided into two halves. I will further divide up those halves, but for now, let’s stick with two.

The first half is the code that runs in a Web browser. This consists of HTML and some form of scripting language. For me, the scripting language is JavaScript, but there are other, less common choices.


The HTML code will handle input from the user and perform all the steps necessary to format the visual aspects of the webpage. This is where you ask the user what he or she wants to do along with defining the text and graphics to be displayed on the page. Of most importance to this discussion, you will use HTML to declare where video will be shown on the page.

For instance, to display CIF (Common Intermediate Format) video you will create a 384 pixels by 288 pixels container. QCIF (Quarter CIF) would only need 176 pixels by 144 pixels.


First, I need to say that despite its name, JavaScript is not the same as the Java programming language. JavaScript has its origins way back in the days of Netscape (my very first Web browser) where it was known as LiveScript. It’s an object-oriented scripting language that supports dynamic typing.

Dynamic Typing means that you can declare a variable (with the var statement) and use that same declaration as an integer, string, or any other JavaScript data type. This is totally contrary to Java’s strict typing where strings, integers, characters, and all other data type are completely separate entities. Assigning an integer to a string will yield an error. This is not the case with JavaScript, where it’s a date type free-for-all.

The JavaScript portion of your application contains the page’s variables and run-time logic. Within that logic will be code to create the connection to the signaling server and calls to WebRTC functions.

For example, your application will need to call the WebRTC function RTCPeerConnection.setRemoteDescription(). This will be done within your JavaScript.

Calls to the signaling server will also be housed within your JavaScript. Although not a requirement of WebRTC, most developers will choose WebSocket as the path from application to server and server back to application.

A WebSocket object is a fairly easy use and will consist of calls to create a connection, send data on the connection, receive data on the connection, and recognize when the connection closes.

The Signaling Server
If you read my article about WebRTC fundamentals (WebRTC for Beginners), you will know that the WebRTC specification does not specify a particular signaling server. It clearly states that one is necessary, but it puts few restrictions on what it is or how it is accessed.

This can be seen as a mixed blessing. By not prescribing what a signaling server is, developers can use the technology that best suits their needs. Do you want to use SIP? Go ahead and use SIP. Do you want to define your own custom signaling that’s perhaps easier to use than SIP? Go for it.

The con is that there is no standard way to perform your signaling. So, unless you can utilize someone else’s work, you are on the hook to write your own signaling server.

No matter whether you procure or write it, a signaling server needs to do two basic things.

  1. Exchange the metadata necessary to perform the signaling. This includes some form of addressing and Session Description Protocol (SDP) for each browser.
  2. Deal with Network Address Translation and firewalls.

For a closer look as the high level aspects of a signaling server, please refer to An Introduction to WebRTC Signaling.

Wrapping up Part One
Allow me to summarize what I just wrote.

  • A WebRTC solution consists of two parts – a Web browser application and a signaling server.
  • The Web browser application will  consist of HTML and JavaScript.
  • HTML will be used for user input and page display.
  • JavaScript will be used for communication to the signaling server and WebRTC function calls.
  • A signaling server must exist, but WebRTC gives you a lot of leeway as to what it is.

In future installments, I will dig deeper into all these aspects. While you may never write your own WebRTC application, it is important to understand what is happening under the covers. By the time I am finished, I hope that you can do both with relative ease.

Stay tuned for more fun and games!

Related Articles:

Understanding Avaya Internet Protocol Server Interface Resets

I cut my teeth on Port Network (PN) outages when I joined Avaya’s Tier-3 backbone support back in 2006. I was assigned to supporting the S8700 series of duplexed Communication Manager (CM) servers just as CM 3 was being released. Back then, the timers were very tight and a large percentage of my trouble tickets were explaining to customers why an IPSI (Internet Protocol Server Interface) reset, which in turn caused a port network outage.

Avaya uses several different heartbeat mechanisms so that devices know if they have lost connectivity. In the case of port networks, which means any IPSI-controlled cabinet (such as a G650), the heartbeat is variously known as a Sanity Checkslot, Socket Sanity, or IPSI Sanity. This TCP heartbeat is sent to every IPSI every second by the active CM-main (and in duplex CM also by the standby CM). So, if you were to have CM-duplex and duplicated IPSIs in each of the maximum of 64-port networks (2 CM*2 IPSI *64 PN) 256 heartbeats would fly through the network each second.

Originally, the IPSI would react if only three consecutive heartbeats went missing. Starting in CM 3.13, the timer was administrable by an Avaya engineer and in CM 5.0 it became administrable by customers on the CM change system-parameters ipserver-interface form. Now the IPSI Socket Sanity Timeout defaults to 15 seconds (values: 3 to 15 seconds). Data from CM substitutes for missing heartbeats.


Frequently, the cause of missing heartbeats is a mismatch between the IPSI being locked to communicate 100 Mbps/full duplex while the Ethernet switch was set to auto-negotiate (resulting in a half-duplex connection), or vice versa. Also, not enabling quality of service (QoS) to give priority to IPSI traffic, or not segregating the IPSI traffic into a separate physical/virtual LANs, frequently caused problems.

Upon detecting the outage, the IPSI assumes it is sick and reacts by performing a warm reset. During the warm reset, stable calls using resources within the PN stay up. But neither new calls can be initiated nor established calls transition to some other state (e.g. hold) for the obvious reason that there is no connection to CM to manage such transactions. The IPSI’s warm reset generally takes only a few seconds.

If it still doesn’t get heartbeats or data from CM, then after a default of 60 seconds (values: 60 to 120 seconds) the IPSI escalates to a cold reset. All calls using resources within the PN are dropped. On the change system-parameters port-networks form, the PN cold reset delay timer can be modified.

Next, based on the No Service Time Out Interval, the IPSI then waits for a default of 5 minutes (values: 2 to 15 minutes). During that time, while the IPSI is waiting for communication from CM-main, the resources within that PN are unavailable. Note that if one heartbeat gets through, perhaps on a flapping WAN circuit, the timer resets and the countdown starts from the beginning. If the No Service Time Out timer expires, the IPSI then attempts to register to a CM-Survivable Core (SC), formerly known as Enterprise Survivable Servers.


Each IPSI manages its own prioritized list of addresses for up to seven CM-SC, plus the CM-Main, which is always first on the list. Actually, it is in how the CM-SCs are configured that determines the server list for the IPSI. And it is the job of the CM-SC to advertise its own values to the IPSIs so that each IPSI can generate the appropriate list of eight server addresses. A customer can have up to 63 CM-SC. Note that IPSIs cannot register to CM-Survivable Remote (formerly known as Local Survivable Processors) servers.

The preference setting (System Preferred/Local Preferred, Local Only) along with a Community Size field and a Priority Score field, determines the server’s priority on IPSI’s lists. How to assign weighting of these values is beyond the scope of this article.


Each server in a CM-Duplex configuration is constantly comparing its health to the other. One statistic among many they compare is how many IPSIs each one can communicate with right now. If the standby server can communicate with more IPSIs than the active server, the standby takes over and makes itself Active. This can cause frequent server interchanges if an unreliable WAN link connection to a PN causes some of the heartbeats to get lost. So, Avaya introduced the option to Ignore Connectivity in Server Arbitration on the change ipserver-interface n form, thereby potentially reducing interchanges.


I have ignored duplicated IPSIs because I am not a big fan of them. Most of the IPSI-related tickets I’ve received were caused by network issues that duplicated IPSIs would not have protected against.

Based on my experiences, I recommend helping calls in progress stay up as long as they can by delaying the Port Network Cold Reset to 120 seconds. Then I suggest hurrying the registration to a CM-SC by setting the No Service Time Out to 2 minutes.

Although PNs are fading from Avaya’s product mix, they are a solid technology representing 30 years of development. Many customers will rely on them for years to come.

Understanding SIP PRACK for Avaya Aura

As many of my readers know, every few months I teach a two and a half day class on “all things SIP.” My students are exposed to everything from “why SIP” to the nitty-gritty of SIP requests, responses and call flows. I even speak about some of the more esoteric topics such as To and From tags, the Replaces header, nonce values and TR-87.

Included in the esoteric list is the PRACK (Provisional Response Acknowledgement) method. PRACK wasn’t in the original SIP specification and was introduced later in RFC 3262. It came about after it was realized that some user agent servers need to know that a provisional response was received by a user agent client. Before PRACK, 1xx responses sent using UDP might get lost, and the sender would never know. PRACK adds a layer of reliability to an otherwise unreliable call flow.

I previously addressed PRACK in my article “Ducks Go Quack. SIP Goes PRACK.” Although I addressed most of the pertinent material, I was short on examples and real-life call flows. As I walked my most recent students through live calls on my company’s Avaya system, I happened to notice a few PRACKs and decided it was time to update my old article.

The following screenshots were gathered using the Avaya traceSM utility. I simply started traceSM on a live Aura system, let it run for a few minutes, and then stopped it after I noticed a few PRACK messages fly by. This was simply because I was unsure as to when Avaya uses PRACKs and when it does not.  In other words, “When in doubt, trace it out.”


Let’s start at the beginning. PRACK messages aren’t just sent out-of-the blue. The sender of an INVITE message must indicate that it is capable of sending PRACKS. It does that by including the header in the INVITE message:

Supported: 100Rel

This tells the recipient that, if requested, it will send PRACK messages for 1xx Responses.

The following shows an INVITE with such a header.


Now that the user agent server knows that PRACK messages are possible, it will include headers similar to the following in all 1xx Responses it wants to be PRACKed:

Require: 100Rel

Rseq: 1

The Requires header with a value of 100Rel tells the user agent client (the sender of the INVITE) that a PRACK is expected for this response. It’s important to know that the user agent server (the sender of the Response messages) has to request the PRACK. It’s not an automatic process and must be initiated with an Rseq header.

The value in Rseq is used by the user agent client when it creates a PRACK message. The user agent server is responsible for setting and incrementing this number.

The following 180 Ringing indicates that it expects a PRACK.


Upon receipt of this 180 Ringing, the user agent client must respond with a PRACK message. Of interest to this article is the Rack header. This header must contain the Rseq value sent in the previous 180 Ringing. Additionally, it will indicate the original INVITE session’s CSeq number. Look back at the INVITE in this call flow, and you will see a CSeq value of 1 (one). Therefore, the Rack will look as follows:

Rack: 1 1 INVITE


Next, the user agent server will send a 200 Ok for the PRACK. This tells the user agent client that the PRACK was received and processed.


For grins, I will now show you the 200 Ok for the original INVITE. Note that it does not have a Rseq header and 100Rel is not in the Requires header. Why not? That’s because this is not a provisional response. PRACKs are only sent for 1xx responses.


Mischief Managed

Before I close things out, I want to address the question I hinted at near the top of this article.  When does Avaya use PRACK?

While I honestly don’t know all the permutations, it appears that an INVITE from an Avaya endpoint will always indicate that it supports PRACK (Supported: 100Rel).  However, as you just learned, it’s the recipient of the INVITE that indicates if PRACK messages are required.

In the example above, the Avaya Modular Message voice mail server requests PRACK messages.  Additionally, PRACK is used when direct media is enabled.

There is a good chance that PRACK is used in other situations, but I am going to have to start up a few more traceSM sessions to learn where they show up.

That’s about all I really need to say about PRACK. I invite you to take a look at the RFC if you want to learn about any PRACK subtleties I might have missed, but for all practical purposes, I’ve said all that needs to be said. I hope you had as much fun today as I did. As is often the case, I learned something in the process of writing this article, and that’s always a good thing.

Understanding Avaya Aura SIP Registration

“Let’s start at the very beginning/a very good place to start/when you read you begin with A B C/when you sing you begin with Do Re Mi.”

I have always loved musicals, and Rogers and Hammerstein’s “The Sound of Music” is high on my list of favorites. Sure, it’s corny and far from historically accurate, but that doesn’t bother me in the least. I’m always willing to set aside any sense of reality for good singing, romance and adventure, and “The Sound of Music” has them all.

So … what does this have to do with unified communications? REGISTER, of course. Like Do Re Mi, you begin SIP with REGISTER.

This article is a continuation of the concepts I presented in A Close Look at Avaya Aura IMS Call Processing and An Even Closer Look at Avaya Aura IMS Call Processing, and I’d suggest you take a look at those before tackling this one.

Can you get SIP devices to communicate without REGISTER? Absolutely. In fact, when I teach my SIP class, the students put their SIP clients into point-to-point mode, which does not require REGISTER. This means that clients send SIP requests and responses directly to the other clients, not through a proxy. The clients can do everything all by themselves.

However, point-to-point without REGISTER has a serious downfall. The clients are required to know the IP addresses of all the other clients they wish to communicate with. While this is fine in a limited classroom environment, it becomes unwieldy after you grow beyond a handful of endpoints.

As an analogy, imagine having to know the IP address of everyone you wanted to send an email to. That’s the same problem you have if you don’t use REGISTER. It’s simply not practical.

The Tie that Binds

REGISTER associates a user’s identification, or Address of Record (AOR), with one or more locations. Note that I said locations. You are not limited to registering an AOR to a single device. Personally, I routinely register my AOR to a physical desk phone and multiple SIP soft-clients. Avaya Aura supports up to ten such registrations per user. That’s enough to make even the most device-crazy nerd happy.

You bind an AOR to an IP address with a Contact header.  For example, one of my soft clients might tell a SIP registrar that aprokop can be reached at with this Contact header.

Contact: Andrew Prokop <SIP:aprokop@>

Registrations are time-based and will eventually expire. This requires the client to periodically refresh a REGISTER with a new REGISTER. Actually, new isn’t the correct word to use for this. Subsequent REGISTER messages must contain the same Contact, To, From, call-ID and From Tag as the original registration. This allows the SIP registrar to know that it’s simply a refresh and not a new registration for the same AOR.

Note that CSeq will increment with each REGISTER sent.

Keeping Things Secure

I might tell my communications system that I am Andrew Prokop, but it would be foolish to trust me at face value. That’s why SIP allows a REGISTER to be challenged.

Before I go through a REGISTER challenge, allow me to define something known as a nonce.

Nonce stands for Number Once and is an arbitrary number used only once in a cryptographic communication. The recipient of a nonce will use it to encrypt his or her credentials. Number Once refers to the fact that encryption with this nonce can only be done one time. If someone were to sniff the LAN and obtain someone’s encrypted password, it won’t do them any good because it can only be used in a single transaction. It becomes stale and useless immediately after its first use.

A REGISTER flow is fairly simple and follows these steps:

  1. A user sends a REGISTER to the SIP registrar. For Avaya Aura, this is a Session Manager. The To and From headers contain the user’s AOR. The user specifies the number of seconds the registration should be valid in the Expires header. This value can be later raised or lowered by the registrar.
  2. The registrar returns a 401 Unauthorized response with a WWW-Authenticate header.  This header contains data that must be used to encrypt the user’s communications password. Specifically, it contains a nonce along with the name of the encryption algorithm that the client must use.
  3. The user sends a second REGISTER to the SIP registrar. This REGISTER contains an Authorization header. Within Authorization is the user’s encrypted password.
  4. If the correct password is received by the registrar, a 200 Ok response is sent to signify a successful registration. An Expires header may be present with a different value than what the user requested. This is the time the registration will be valid as determined by the registrar’s policies.

A registration is removed by sending a REGISTER with an Expires header value of 0 (zero).

In a picture, we have this.

Reg1Using the traceSM tool on an Avaya Aura Session Manager, I captured the following trace that shows a REGISTER, the challenge and a REGISTER with encrypted credentials.  Take a look at the headers, and you’ll see that they’re doing exactly what I said they would do.

Reg2 Reg3 Reg4


In the case of my daily work life, my various SIP devices will each send a REGISTER, be challenged and resend the REGISTER with the encrypted credentials. They periodically refresh their registrations to ensure that I am able to make and receive calls on all my devices until I am finished for the day.

Speaking of finished for the day, that’s about all I have to say about REGISTER. It’s not that complicated once you understand the basics. Just keep in mind that while registration isn’t absolutely mandatory, it enables a secure, scalable and easy to manage SIP solution.

… And these are a few of my favorite things!

Andrew Prokop is the Director of Vertical Industries at Arrow Systems Integration. Andrew is an active blogger and his widely-read blog, SIP Adventures, discusses every imaginable topic in the world of unified communications. Follow Andrew on Twitter at @ajprokop, and read his blog, SIP Adventures.