VoIP Protocols: Introducing H.323

Vladimír Toncar
 

"Packet-based multimedia communications systems", better known as H.323, is an international Voice over IP standard defined by the International Telecommunications Union. The first version of H.323 was published in 1996, the last (6th) version appeared in 2006.

When dealing with H.323, it is good to realize that it is not a single protocol but rather an entire group of protocols. The individual protocols used under the umbrella of H.323 include:

  • H.225.0 for call signalling;
  • Q.931, a protocol borrowed from ISDN, also used for call signalling;
  • H.245 for negotiating audio/video channel parameters;
  • H.235 for security and authentication;
  • RTP, the Real Time Protocol defined by IETF, used to transmit audio/video streams;
  • H.450.x for additional services like call transfer, call diversion, etc.

Most of these protocols are defined in the ASN.1 language and the protocol messages are encoded with ASN.1 PER (Packed Encoding Rules). There are two exceptions where ASN.1 is not used: Q.931 (borrowed from the ISDN stack) and RTP, the protocol defined by IETF.

H.323 Entities

Let's start by explaining the names that H.323 uses for various entities that appear in the VoIP network.

A Terminal is typically a software or hardware VoIP phone. Certain programs (for example, a voice mail software) could also introduce themselves as terminals in the protocol exchange.

A Gateway is a device that allows a bidirectional communication with devices in another telecommunication network. The other network is usually PSTN but you can also have a H.323-to-SIP gateway or even a H.323-to-H.323 gateway. Formally, a gateway consists of two sub-components: (1) Media Gateway Controller (MGC) handles call signalling and (2) Media Gateway (MG) routes the audio (and possibly video) streams. You will usually find the two components implemented within a single box but they can also be separate if you want the gateway to scale to a higher number of concurrent calls (in that situation, you typically have a single MGC and several MGs).

A Multipoint Conference Unit (MCU) is a device that is used for multiparty conferencing. Again, it formally consists of two function blocks, a Multipoint Controller (MC) and Multipoint Processor (MP) where the latter is responsible for mixing the audio/video channels for the conference.

Terminals, gateways, and MCUs are collectively referred to as Endpoints. In addition to endpoints, the H.323 network can optionally have a fourth component, a Gatekeeper. Gatekeepers play the role of central controllers in the network. The most important tasks of a gatekeeper are registration of endpoints and call admission. The set of endpoints managed by the same gatekeeper is called a Zone.

Communication Between Entities

Let's now look how the individual entities that in the H.323 network use the various sub-protocols of H.323.

First, for the endpoint-gatekeeper and gatekeeper-gatekeeper communication, a subset of the H.225.0 protocol is used. This subset of H.225.0 is known as RAS (Registration, Admission, Status). H.225.0-RAS contains messages for endpoint registration and unregistration at the gatekeeper, messages for call admission, call end, gatekeeper discovery, etc. The H.225.0-RAS messages are sent over the UDP protocol and the gatekeeper listens at port 1719/udp (unicast) and 1718/udp (multicast). The multicast address reserved for gatekeeper communication is 224.0.1.41.

For call signalling between endpoints, H.323 uses the protocol Q.931. Q.931 has been borrowed from ISDN and it's messages contain the typical telephony data (like calling and called number). However, Q.931 does not have certain data fields that are needed for Voice over IP communication (for example IP addresses and listening ports). To solve this, Q.931 messages embed H.225.0 messages that carry the complete information. The H.225.0 message is encoded using ASN.1 PER to a binary form and then inserted into the corresponding Q.931 message to a field that can carry a custom string of bytes (known as the User-User Information Element). Generally speaking, the embedding of messages of one protocol inside the messages of another protocol is used quite frequently throughout H.323.

Third, the H.245 protocol is used to negotiate audio (and video) parameters between endpoints. The negotiation covers codecs, IP addresses and ports, i.e. the parameters needed for RTP streams.

Last but not least, the Real Time Protocol (RTP) is used to carry the audio/video streams between the communicating endpoints.

We will deal with the individual sub-protocols of H.323 in detail in subsequent parts of the Overview.

Next section: VoIP Protocols: H.323 Call Flow

Comments on this piece, or the VoIP Overview as a whole, are welcome on Vladimir's blog.

Related articles