VoIP Protocols: Introducing SIP

Vladimír Toncar

The Session Initiation Protocol (SIP for short) is a Voice over IP protocol designed by the Internet Engineering Task Force. SIP was created by the MMUSIC group of the IETF (MMUSIC stands for Multi-party Multimedia Session Control). Formally, the protocol is intended for creating, modifying and terminating sessions with one or more participants. The sessions are mainly VoIP telephone calls or conferences.

The first version of SIP was published in 1999 in RFC2543 with the two main authors being Mark Handley and Henning Schulzrinne. The standard was updated to version 2.0 in 2002 with RFC3261 and naturally there were many subsequent updates and extensions (RFC3265, RFC3853, RFC4320, RFC4916, RFC5393, RFC5621, RFC5626, RFC5630).

SIP Characteristics

Unlike H.323, SIP is a text-based protocol. The formatting of SIP requests and responses is based on HTTP version 1.1. Endpoints that communicate using SIP use the following three protocols:

SIP itself, used to establish and terminate the session;
Session Description Protocol (SDP for short, RFC2327, obsoleted by RFC4566), used to exchange information about audio/video channels. Like SIP, SDP is also a product of the IETF's MMUSIC group;
RTP, used to send the real-time streams of audio or video across the network.

SIP messages are exchanged between endpoints in transactions. A transaction consists of a request and the related response or responses. The messages that belong to the same transaction share the same transaction ID. This ID is called CSeq in SIP. Each transaction should have a unique CSeq number, with only a single exception: the ACK message (ACK for "acknowledge") uses the same CSeq number as the transaction which it applies to.

SIP can use both UDP or TCP as the underlying transport protocol. Originally (in RFC2543), UDP was the only mandatory option. According to RFC3261 from 2002, all endpoints must be able to send SIP messages over both UDP and TCP. Still, UDP is the more frequently used option. When communicating over TCP, two modes are possible: either the same TCP channel is used for all transactions of a session or a new TCP connection is established for each individual transaction.

SIP Entities

A SIP network can consist of a number of entities. The distinction among the entities is logical — it is often the case in practice that some of the entities are combined together in a single SIP server (the most usual case being the Proxy/Registrar server).

Let us name the individual entities:

User Agent: each SIP telephone (either hardware or software one) is seen as a User Agent (UA for short) by SIP. The UA may perform two roles: in the User Agent Client (UAC) role, the UA sends requests. In the User Agent Server (UAS) role, the UA receives requests and sends responses.
Registrar: the task of this server is to receive REGISTER messages from SIP clients. The REGISTER message contains location information (an IP address) for the given SIP client (e.g. a telephone). The registrar keeps the location information in a database and thus knows the IP address (or addresses) for each registered user.
Proxy: The task of a proxy server is to receive SIP requests and forward them to the target SIP endpoint (or to another proxy that is closer to the target). There are several reasons why people might prefer to send SIP messages via a proxy server instead of the direct route. The two most important reasons are (1) location service (the proxy server has access to the location database maintained by the registrar) and (2) enforcing call policies (a user must call via the proxy if you want to check that he or she is actually allowed to place the call).
Redirect server: This is a special server that accepts SIP INVITE requests but reacts to all of them with redirect responses (code 3xx), directing the calling client to the actual location it needs to contact to reach the target endpoint. To perform this role, the redirect server needs an access to the location service (the location database maintained by the registrar server). A redirect server could be used in a network with so much traffic that a regular proxy server would be overloaded.
Gateway: A gateway is a server that receives SIP calls and translates them to another telecommunication network (be it the public telephone network, a H.323 network, or even Skype).

Locating Other SIP Users

A SIP address (SIP URI) has the form "sip:user@server". The URI can be more complex, but this is the most usual form.

If you want to call someone using a SIP telephone, there are two ways how to locate the user: by querying the domain name system (DNS) or by relying on a location service.

1. Querying the DNS: Suppose someone told you could call them at "sip:john@sip.somecompany.com". This is the simplest situation, as your SIP telephone (a software one, probably) will resolve the domain name "sip.somecompany.com" to an IP address of the server and then contact this server to establish the SIP call. The DNS can also support the translation of telephone numbers to IP addresses using the ENUM service. ENUM is outside the scope of this text, so let us just say there is such possibility.

2. Using a location service: The SIP RFC does not specify how a location service in a SIP network should be implemented. The usual solution, though, is a proxy server co-located with the registrar server. This scenario is applicable especially if you are using hardware SIP telephones and want to call other people simply by dialling their telephone number. This is how it works:

The SIP telephones register with the registrar server. The registrar stores the location information (telephone numbers and the related IP addresses) in a database.
The proxy server has access to the location database as well.
All telephones in the network know the address of the proxy server and if they want to call a certain number, they will always send the request (the INVITE message) to the proxy server. For example, if you dial 101 and the IP address of the proxy server is 10.1.1.10, the telephone will call the SIP URI: "sip:101@10.1.1.10".
The proxy server will forward the call request to the destination telephone.

We now have sufficient background to have a look at individual SIP messages (both requests and responses) and then describe the SIP call flow.

Next section: SIP Messages