Writing Your First WebRTC Application: Part 1

This is not the article that I intended to post today. The real article, which is still a work in progress, is a step-by-step approach to writing your first WebRTC application. It’s chock-full of JavaScript, WebRTC function calls, and HTML code.

However, it’s proving to be much bigger and more complicated than I expected. It’s not a simple task to take the reader from square one all the way across the board and not require 30 or more pages to get him or her there.

So, like all good elephant feasts, I’ve decided to break this into smaller pieces and spend some quality time on each one of them. My goal is that when all is said and done, you will be able to string the pieces together to get a much better understanding than if I just threw them at you all at once.

Which brings me to Part One. I’ll try not to get too technical (which is a difficult task given the nature of this subject) and present the high-level concepts of writing a WebRTC application. For some of you, this will be enough. For others, it will leave you wanting more.

The Beginning of the Beginning
A WebRTC application can be divided into two halves. I will further divide up those halves, but for now, let’s stick with two.

The first half is the code that runs in a Web browser. This consists of HTML and some form of scripting language. For me, the scripting language is JavaScript, but there are other, less common choices.


The HTML code will handle input from the user and perform all the steps necessary to format the visual aspects of the webpage. This is where you ask the user what he or she wants to do along with defining the text and graphics to be displayed on the page. Of most importance to this discussion, you will use HTML to declare where video will be shown on the page.

For instance, to display CIF (Common Intermediate Format) video you will create a 384 pixels by 288 pixels container. QCIF (Quarter CIF) would only need 176 pixels by 144 pixels.


First, I need to say that despite its name, JavaScript is not the same as the Java programming language. JavaScript has its origins way back in the days of Netscape (my very first Web browser) where it was known as LiveScript. It’s an object-oriented scripting language that supports dynamic typing.

Dynamic Typing means that you can declare a variable (with the var statement) and use that same declaration as an integer, string, or any other JavaScript data type. This is totally contrary to Java’s strict typing where strings, integers, characters, and all other data type are completely separate entities. Assigning an integer to a string will yield an error. This is not the case with JavaScript, where it’s a date type free-for-all.

The JavaScript portion of your application contains the page’s variables and run-time logic. Within that logic will be code to create the connection to the signaling server and calls to WebRTC functions.

For example, your application will need to call the WebRTC function RTCPeerConnection.setRemoteDescription(). This will be done within your JavaScript.

Calls to the signaling server will also be housed within your JavaScript. Although not a requirement of WebRTC, most developers will choose WebSocket as the path from application to server and server back to application.

A WebSocket object is a fairly easy use and will consist of calls to create a connection, send data on the connection, receive data on the connection, and recognize when the connection closes.

The Signaling Server
If you read my article about WebRTC fundamentals (WebRTC for Beginners), you will know that the WebRTC specification does not specify a particular signaling server. It clearly states that one is necessary, but it puts few restrictions on what it is or how it is accessed.

This can be seen as a mixed blessing. By not prescribing what a signaling server is, developers can use the technology that best suits their needs. Do you want to use SIP? Go ahead and use SIP. Do you want to define your own custom signaling that’s perhaps easier to use than SIP? Go for it.

The con is that there is no standard way to perform your signaling. So, unless you can utilize someone else’s work, you are on the hook to write your own signaling server.

No matter whether you procure or write it, a signaling server needs to do two basic things.

  1. Exchange the metadata necessary to perform the signaling. This includes some form of addressing and Session Description Protocol (SDP) for each browser.
  2. Deal with Network Address Translation and firewalls.

For a closer look as the high level aspects of a signaling server, please refer to An Introduction to WebRTC Signaling.

Wrapping up Part One
Allow me to summarize what I just wrote.

  • A WebRTC solution consists of two parts – a Web browser application and a signaling server.
  • The Web browser application will  consist of HTML and JavaScript.
  • HTML will be used for user input and page display.
  • JavaScript will be used for communication to the signaling server and WebRTC function calls.
  • A signaling server must exist, but WebRTC gives you a lot of leeway as to what it is.

In future installments, I will dig deeper into all these aspects. While you may never write your own WebRTC application, it is important to understand what is happening under the covers. By the time I am finished, I hope that you can do both with relative ease.

Stay tuned for more fun and games!