VoIP Basics: Overview of Audio Codecs
The first part of this series described the conversion of voice to the digital form. Once we have the audio signal represented as a sequence of samples, the next step is to compress it to reduce the consumption of network bandwidth required to transmit the speech to the receiving party.
The compression and decompression is handled by special algorithms we call codecs (COder-DECoder). Let's have a look at some popular codecs that are being used in Voice over IP. All the codecs we list here expect the input to be audio sampled at 8 kHz with 16-bit samples.
In the text below, we will mention the MOS of the codecs. MOS stands for "Mean Opinion Score". MOS measures the perceived quality of audio after it has been compresses by the particular codec, transmitted, and decompressed. The score is assigned by a group of listeners using the procedure specified in ITU-T standards P.800 and P.830. The interpretation of individual MOS values is as follows:
Let's now describe the individual codecs.
G.711 is a codec that was introduced by ITU in 1972 for use in digital telephony, i.e. in ISDN, T.1 and E.1 links. The codec has two variants: A-Law is being used in Europe and in international telephone links, u-Law is used in the U.S.A. and Japan.
G.711 uses a logarithmic compression. It squeezes each 16-bit sample to 8 bits, thus it achieves a compression ratio of 1:2. The resulting bitrate is 64 kbit/s for one direction, so a call consumes 128 kbit/s (plus some overhead for packet headers). This is quite a lot when compared with other codecs.
This codec can be used freely in VoIP applications as there are no licensing fees. It works best in local area networks where we have a lot of bandwidth available. It's benefits include simple implementation which does not need much CPU power (can be implemented using a relatively simple table lookup) and a very good perceived audio quality - the MOS value is 4.2.
G.729 is a codec that has low bandwidth requirements but provides good audio quality (MOS = 4.0). The codec encodes audio in frames, each frame is 10 milliseconds long. Given the sampling frequency of 8 kHz, the 10 ms frame contains 80 audio samples. The codec algorithm encodes each frame to 10 bytes, so the resulting bitrate is 8 kbit/s for one direction.
When used in VoIP, we usually send 3-6 G.729 frames in each packet. We do this because the overhead of packet headers (IP, UDP, and RTP together) is 40 bytes and we want to improve the ratio of "useful" information.
G.729 is a licensed codec. As far as end users are concerned, the easiest path to using it is to buy a hardware that implements it (be it a VoIP phone or gateway). In such case, the licensing fee has already been paid by the producer of the chip used in the device.
A frequently used variant of G.729 is G.729a. It is wire-compatible with the original codec but has lower CPU requirements.
G.723.1 is a result of a competition that ITU announced with the aim to design a codec that would allow calls over 28.8 and 33 kbit/s modem links. There were two very good solutions and ITU decided to use them both. Because of that, we have two variants of G.723.1. They both operate on audio frames of 30 milliseconds (i.e. 240 samples), but the algorithms differ. The bitrate of the first variant is 6.4 kbit/s and the MOS is 3.9. The bitrate of the second variant is 5.3 kbit/s with MOS=3.7. The encoded frames for the two variants are 24 and 20 bytes long, respectively.
G.723.1 is a licensed codec, the last patent that covers it is expected to expire in 2014.
GSM 06.10 (also known as GSM Full Rate) is a codec designed by the European Telecommunications Standards Institute for use in the GSM mobile networks. This variant of the GSM codec can be freely used so you will often find it in open source VoIP applications. The codec operates on audio frames 20 milliseconds long (i.e. 160 samples) and it compresses each frame to 33 bytes, so the resulting bitrate is 13 kbit/s (to be precise, the encoded frame is exactly 32 and 1/2 byte, so 4 bits are unused in each frame). The codec's Mean Opinion Score is 3.7.
Speex is an open source patent-free codec designed by the Xiph.org Foundation. It is designed to work with sampling rates of 8 kHz, 16 kHz, and 32 kHz and can compress audio signal to bitrates between 2 and 44 kbit/s. For use in VoIP telephony, the most usual choice is the 8 kHz (narrow band) variant.
iLBC (internet Low Bit Rate Codec) is a free codec developed by Global IP Solutions (later acquired by Google). The codec is defined in RFC3951. With iLBC, you can choose to use either 20 ms or 30 ms frames and the resulting bitrate is 15.2 kbit/s and 13.33 kbit/s, respectively. Much like Speex and GSM 06.10, you will find iLBC in many open source VoIP applications.
Next section: Codec Latency vs. Bandwidth Optimization
Comments on this piece, or the VoIP Overview as a whole, are welcome on Vladimir's blog.