VoIP Protocols: H.323 Call Signalling Optimizations
In the previous piece, we have discussed the basic H.323 call flow. This call signalling, as defined in H.323 version 1, suffers from several deficiencies:
- The negotiation takes too long. Quite a many messages need to be exchanged before the endpoints obtain the important information. Sometimes the signalling takes so long that users might notice the delay between accepting the call and hearing the other party;
- the negotiation needs an additional TCP channel which is painful especially if the call needs to pass through network firewalls.
Besides these two, the most notable weak point of H.323 is that it needs quite a lot of processor power. It is not easy to implement it in an "embedded" device that is expected to be small and cheap and thus uses a slower processor. This is not something that can be helped easily, though, so let's have a look at the things that have been improved in H.323 version 2 and above.
To avoid the need to open a second TCP/IP channel for H.245, H.323 version 2 and above support the tunneling of H.245 messages. The H.245 message is encoded to the binary form (using ASN.1 PER) and the resulting string of bytes is inserted to any Q.931/H.225.0 message that is currently being sent. If there is no available message, the endpoint uses the Facility message instead.
This approach introduces a second level of binary embedding: You encode the H.245 message and insert the binary form into the H.225.0 message, then encode the H.225.0 message and insert the binary form into the Q.931 message.
With the standard H.245 negotiation, the two endpoints need three round-trips before they agree on the parameters of the audio/video channels (1. master/slave voting, 2. terminal capability set exchange, and finally, 3. opening the logical channels). In certain situations and especially with high-latency network links, this can last too long and users will notice the delay.
To overcome that, the Fast Connect procedure was designed (sometimes also called "Fast Start"). The endpoint will simply prepare several variants of the H.245 request openLogicalChannel, based on how many codecs it supports. After that, the endpoint encodes each variant of the message to the binary form and the resulting array of binary strings is inserted into a H.225.0/Q.931 message (usually the Setup message). The called party will pick one of the variants and confirm it back in the next H.225.0/Q.931 message, together with its own list of logical channel variants. The rest of the H.245 negotiation will be done the standard way.
Using the Fast Connect, the parameters of logical channels (codecs, IP addresses, and ports) are negotiated early in the message exchange, before the called user accepts the call. The price is a yet another complication in the protocol. Note that Fast Connect and H.245 tunnelling can be used in parallel.
H.323 version 4 added a yet another method to speed up the negotiation of audio/video parameters. The method again uses binary embedding: The calling endpoint creates its H.245 request terminalCapabilitySet at the very beginning of the call and embeds this message in the H.225.0/Q.931 message Setup. This way, the called party knows the caller's whole capability list right from the beginning. Parallel H.245 can be combined with both Fast Connect and H.245 tunneling.
Early Media Start
This method aims at eliminating delays when starting RTP streams. The time needed to allocate all the required memory structures, open sockets, and start sending the RTP stream might not be that significant, but it adds up to the overall delay, especially on slower (embedded) platforms.
The solution is to start sending RTP as soon as the media channels are negotiated, even before the called user accepts the call. You naturally cannot start sending actual sound from the user's microphone but should send silence or some random noise instead. You switch to the real sound when the user accepts the call.
I hope this text gave you a good introduction to H.323 call signalling optimizations. Should you need to see into the details, please consult the H.323 standard itself.
In practice, you might need to experiment to find which combination of the optimization methods works best with your hardware and your network. I'd recommend to start with H.245 tunneling and Fast Connect enabled and then try to add Parallel H.245 and Early Media Start to see if they make any difference.
Next section: Introducing SIP
Comments on this piece, or the VoIP Overview as a whole, are welcome on Vladimir's blog.