Connecting customers with WebRTC


One of our long desired feature was to provide an option of connecting customers without having to share their number. This made us look out for a skype/Hangout kind of solution. The obvious choice was to go for a no-install no-fuss ready-made solution and that which doesn’t need us to be intermediary so that privacy of the conversation is maintained and customers can easily use. WebRTC became a natural choice for. This blog would give you the technical details to setup & run WebRTC based communication.

WebRTC is an open source peer to peer framework that enables Real Time Communication across the web. This specification is compatible with all major browsers and there are few SDKs also available. Since our choice was not to increase our build size, we went with the browser based solution

Breaking down a WebRTC Connection:

Network Address Translation (NAT) gives private IP addresses access to the Internet. NAT allows a single devices, such as a router, to act as an agent between the Internet (populated with public IP addresses) and a private network (populated with private IP addresses). A NAT device can use a single public IP address to represent many private IP addresses

Symmetric NAT not only translates the IP address from private to public (and vice versa), it also translates ports. There are various rules as to how that translation and mapping occurs, but it’s safe to say that with symmetric NAT, you should never expect that the IP address/port of the source is what the destination will see.

STUN (Session Traversal Utilities for NAT) server. A STUN server allows clients to discover their public IP address and the type of NAT they are behind. This information is used to establish the media connection. In most cases, a STUN server is only used during the connection setup and once that session has been established, media will flow directly between clients.


If a STUN server cannot establish the connection, ICE can turn to TURN. Traversal Using Relay NAT (TURN) is an extension to STUN that allows media traversal over a NAT that does not do the “consistent hole punch” required by STUN traffic. TURN servers are often used in the case of a symmetric NAT.

Unlike STUN, a TURN server remains in the media path after the connection has been established. That is why the term “relay” is used to define TURN. A TURN server literally relays the media between the WebRTC peers.

Note: If the connection requires a TURN server, be aware that all traffic for that connection would go through this server and can cost you network bandwidth. This may not be required for every connection and it is best to monitor this server usage from time-to-time.


Session Description Protocol (SDP) is a standard for describing the multimedia content of the connection such as resolution, formats, codecs, encryption, etc so that both peers can understand each other once the data is transferring.

Offer/Answer and Signal Channel
Unfortunately WebRTC can’t create connections without some sort of server in the middle. We call this the Signal Channel. It’s any sort of channel of communication to exchange information before setting up a connection.

The information we need to exchange is the Offer and Answer which just contains the SDP mentioned above.

Peer A who will be the initiator of the connection, will create an Offer. They will then send this offer to Peer B using the chosen signal channel. Peer B will receive the Offer from the signal channel and create an Answer. They will then send this back to Peer A along the signal channel.

ICE candidate
As well as exchanging information about the media (discussed above in Offer/Answer and SDP), peers must exchange information about the network connection. This is know as an ICE candidate and details the available methods the peer is able to communicate (directly or through a TURN server).

Exchange of communication between peer - Diagram


This being the basic components, you can create custom UI that handles features like mute, volume adjustments, call accept/decline handling, audio only or audio & video call etc to make it as a complete product.

Implementation flow of WebRTC
When a user starts a WebRTC call to another user, a special description is created called an offer. This description includes all the information about the caller's proposed configuration for the call. The recipient then responds with an answer, which is a description of their end of the call.

In this way, both devices share with one another the information needed in order to exchange media data. This exchange is handled using Interactive Connectivity Establishment (ICE, a protocol which lets two devices use an intermediary to exchange ‘offers’ and ‘answers’ even if the two devices are separated by Network Address Translation (NAT).

Each peer, then, keeps two descriptions on hand: the local description, describing itself, and the remote description, describing the other end of the call.

The offer/answer process is performed both when a call is first established, but also any time the call's format or other configuration needs to change. Regardless of whether it's a new call, or reconfiguring an existing one, these are the basic steps which must occur to exchange the offer and answer, leaving out the ICE layer for the moment:

1. The caller captures local Media via navigator.mediaDevices.getUserMedia(). The MediaDevices.getUserMedia() method prompts the user for permission to use a media input which produces a MediaStream with tracks containing the requested types of media.

.then(function(stream) {
  /* use the stream */
.catch(function(err) {
  	/* handle the error */			
var errobj = {code:err.code,,message:err.message};
 WebRtcWebResponse.WebrtcResponse(JSON.stringify(errobj), 'Error', 6);
2. The caller creates RTCPeerConnection and calls
RTCPeerConnection.addTrack()(Since ddStream is being deprecated)
The RTCPeerConnection interface represents a WebRTC connection between the local computer and a remote peer. It provides methods to connect to a remote peer, maintain and monitor the connection, and close the connection once it's no longer needed.
Ex: var RTCConn = new RTCPeerConnection(servers);
AddTrack :
The RTCPeerConnection method addTrack() adds a new media track to the set of tracks which will be transmitted to the other peer.
Example : stream.getTracks().forEach(track => pc.addTrack(track, stream));
3. The caller calls RTCPeerConnection.createOffer() to create an offer. The createOffer() method of the RTCPeerConnection interface initiates the creation of an SDP offer for the purpose of starting a new WebRTC connection to a remote peer.

RTCConn.createOffer(function (offer) {
	 /* use the offer */
},function (err) {
	var errobj = {code:err.code,,message:err.message};
	WebRtcWebResponse.WebrtcResponse(JSON.stringify(errobj), 'Error', 8);
The caller calls RTCPeerConnection.setLocalDescription() to set that offer as the local description (that is, the description of the local end of the connection)
The RTCPeerConnection.setLocalDescription() method changes the local description associated with the connection. This description specifies the properties of the local end of the connection, including the media format. The method takes a single parameter—the session description—and it returns a Promise which is fulfilled once the description has been changed, asynchronously.

RTCConn.setLocalDescription(offer,function() {
	/* use the offer */
},function(err) {
	// An error occurred, so handle the failure to connect
	var errobj = {code:err.code,,message:err.message};
	WebRtcWebResponse.WebrtcResponse(JSON.stringify(errobj), 'Error', 9);
4. After setLocalDescription(), the caller asks STUN server to generate the ICE candidates.
5. The caller uses the signaling server to transmit the offer to the intended receiver of the call. This is a NodeJs endpoint that we created.The recipient receives the offer and calls RTCPeerConnection.setRemoteDescription() to record it as the remote description(the description of the other end of the connection).
6. The RTCPeerConnection.setRemoteDescription() method changes the remote description associated with the connection. This description specifies the properties of the remote end of the connection, including the media format. The method takes a single parameter—the session description—and it returns a Promise which is fulfilled once the description has been changed, asynchronously.
Example :
var desc = new RTCSessionDescription(CallerOffer);
RTCConn.setRemoteDescription(desc, function() {
/* use the CallerOffer */
},function(err) {
// An error occurred, so handle the failure to connect	    				
var errobj = {code:err.code,,message:err.message};
WebRtcWebResponse.WebrtcResponse(JSON.stringify(errobj), 'Error', 15);
7. The recipient does any setup it needs to do for its end of the call: capture its local media, and attach each media tracks into the peer connection via RTCPeerConnection.addTrack()
The recipient then creates an answer by calling RTCPeerConnection.createAnswer(). The createAnswer() method on the RTCPeerConnection interface creates an SDP answer to an offer received from a remote peer during the offer/answer negotiation of a WebRTC connection. The answer contains information about any media already attached to the session, codecs and options supported by the browser, and any ICE candidates already gathered. The answer is delivered to the returned Promise, and should then be sent to the source of the offer to continue the negotiation process.
Example :
RTCConn.createAnswer().then(function(answer) {
return RTCConn.setLocalDescription(answer);
}).then(function() {
	// Send the answer to the remote peer using the signaling server
	WebRtcWebResponse.WebrtcResponse(JSON.stringify(RTCConn.localDescription), 'answer', 2);
}).catch(function(err) {
	// An error occurred, so handle the failure to connect
	var errobj = {code:err.code,,message:err.message};
	WebRtcWebResponse.WebrtcResponse(JSON.stringify(errobj), 'Error', 14);
8. The recipient calls RTCPeerConnection.setLocalDescription(createdAnswer) to set the answer as its local description. The recipient now knows the configuration of both ends of the connection.
9. The recipient uses the signaling server to send the answer to the caller.
10. The caller receives the answer.
The caller calls RTCPeerConnection.setRemoteDescription() to set the answer as the remote description for its end of the call. It now knows the configuration of both peers. Media begins to flow as configured. The RTCPeerConnection.onicecandidate property is an EventHandler which specifies a function to be called when the ICE candidate event occurs on an RTCPeerConnection instance. This happens whenever the local ICE agent needs to deliver a message to the other peer through the signaling server. This lets the ICE agent perform negotiation with the remote peer without the browser itself needing to know any specifics about the technology being used for signaling; simply implement this method to use whatever messaging technology you choose to send the ICE candidate to the remote peer.
RTCConn.onicecandidate = function(event) {
	if (event.candidate) {
		// Send the candidate to the remote peer
		// All ICE candidates have been sent

The above steps should enable successful communication setup.

As mentioned earlier we implemented this feature as a WebView. This feature requires you to take microphone permission from the user.

Make sure you have retry in your code, as sometimes connection may not be established immediately.
Based on the client networks it can take upto 10 seconds for the call to be successfully connected.
Once the call is underway, we didn’t notice any issues with the clarity and not perceive any lag

For further reference: