6 Audio Channels
In this section, we describe the audio channel classes used in our tutorial application. The default behaviour of the OpenH323 stack is to read (write) audio data from (to) a sound card. Our application needs to override this default behaviour and so we have to define our own audio channels.
The oh323tut application uses two channel classes, both of them derived from the class PIndirectChannel: WavChannel (files wavchan.h and wavchan.cxx) and NullChannel (files nullchan.h and nullchan.cxx). While studying the source code, you will probably realize that the functionality of these two classes could be combined into a single class. We have the two separate classes in order to make the code easier to understand. This also helps to emphasize that the incoming and outgoing audio are independent of one another.
When defining the two new channel classes, we need to override four virtual methods, Close(), IsOpen(), Read(), and Write(). You will find the prototypes of these methods in file wavchan.h (lines 46–49) or in nullchan.h, respectively (lines 45–48). The role of each of the four virtual methods is easy to understand. Once a channel instance is created, it is expected to be open. The method IsOpen() is used to verify that everything during the creation of the channel went well (e.g. device initialization, file opening), Close() is called when the channel is not needed any more. Read() and Write() are used to read/write data from/to the channel instance.
6.1 WavChannel
The task of the WavChannel class is to read audio data from a WAV file. The declaration of the WavChannel starts at line 36 in file tcz_tt1(`wavchan.h'). The class needs four data members. The member myConnection is a reference to a H323Connection class — we need the reference e.g. to close the connection when we reach the end of the WAV file. PWAVFile wavFile is an object that allows us to read audio from the WAV file. The remaining two data members, writeDelay and readDelay are both of type PAdaptiveDelay. We explain the role of adaptive delays in detail below.
Let us now describe the implementation of WavChannel's methods.
6.1.1 Constructor and Destructor
WavChannel's constructor (file wavchan.cxx, lines 30–52) takes two formal parameters: a reference to a WAV file name and a reference to a H323Connection object. These two parameters are used in data member initializers in line 31. The constructor of PWAVFile attempts to open the file, so the first step that we have to do inside the constructor is to check whether the file has been opened successfully (lines 33–38). If the file is not open, we write out an error message, close the H.323 connection and return from the constructor.
Our next step inside the constructor (lines 39–48) is to check whether the WAV file has the required parameters, i.e. the format is PCM, mono (one sound channel only), the sampling rate is 8000 Hz, and the sample size is 16 bits. If the file does not meet the format requirements, we do the same as before, i.e. write out an error message and close the H.323 connection.
The last line in the constructor is just a PTRACE statement that announces the successful creation of the WavChannel object.
The only statement in WavChannel's destructor (lines 57–60) is again a PTRACE statement that informs about the removal of the channel object.
6.1.2 Close() and IsOpen()
The implementation of WavChannel's methods Close() (lines 65–68) and IsOpen() (lines 73–77) is quite simple — they return the truth values obtained from wavFile's Close() and IsOpen(), respectively.
6.1.3 Common Actions Required in Read() and Write() methods
Besides the actual processing of audio data, both Read() and Write() methods have to take care about two things:
- notify the caller about the result of the read/write operation;
- take care of correct timing.
Notifying about the operation result
To notify the channel's user about the result of the read/write operation, we have to do two things. First, we have to set the channel's member variable lastWriteCount (in Write()) or lastReadCount (in Read()) to the number of bytes successfully written or read. After returning from Read() or Write(), this number can be obtained from channel's methods GetLastWriteCount() and GetLastReadCount().
The second thing is that Read() and Write() should return true or false according to the requirements set forth in pwlib/include/ptlib/channel.h . Write() should return true if it succeeded to write all the bytes passed to it, false otherwise. Read() should return true if at least one byte was read, false otherwise.
Timing
In addition to notifying about the success or failure of the read/write operation, we have to take care about timing. When we use a sound card as the source or destination of audio data, the card can provide rather precise timing for us. For example, if we read 80 samples with the sampling frequency of 8000 samples per second, the reading operation will end (almost exactly) 10 milliseconds from the end of the previous read. The timing is essential especially for the Read() method, as it influences the quality of audio at the other (receiving) party. Even though endpoints have jitter buffers, we should be sending RTP packets as accurately as possible.
We cannot rely on the sound card in WavChannel or NullChannel, but the time spent inside Read() or Write() should again correspond to the amount of data read or written. To achieve this, we need to add some sleeping. So, for example, if Write() is called for 480 bytes (i.e. 240 samples — this corresponds to 30 milliseconds) and the processing only needs, say, 1 millisecond inside Write() and 1 millisecond between two consecutive Write() calls, the additional sleep should last the remaining 28 milliseconds, to ensure that the time between two consecutive invocations of Write() is 30 milliseconds.
The problem with the sleep function on most systems (both Unix/Linux and Windows) is that it is not exact. It usually rounds up to multiples of 10 milliseconds. So the rounding error can be anything between 0 and 9 milliseconds and this is bad news if you consider that the a typical call of either Read() or Write() works with 80, 160, or 240 samples, corresponding to 10, 20, or 30 milliseconds, respectively. If we behaved as if the sleeping was exact, we would accumulate a large error during just a few consecutive calls to Read() or Write(). We need to use an adaptive sleeping algorithm, so that even if we cause a timing error during one call of either Read() or Write(), this error will be compensated in the subsequent call (or calls). That way, the time of departure will not be exact for each individual RTP packet, nevertheless the average interval between each two packets should be close to the accurate value.
PWLib implements the adaptive sleeping algorithm in a class called PAdaptiveDelay. The class uses a concept of "target time". When PAdaptiveDelay's method Delay(int time) is called for the first time (time is in milliseconds), the target time is set to the current time plus time milliseconds. During subsequent calls to Delay(), the target time is simply incremented by time milliseconds. After adjusting the target time, the algorithm computes the difference between the target time and the current time and then sleeps the difference.
The fact that we use target time (an absolute value) helps us to avoid accumulating the time error. Suppose Ti is the target time in an i-th iteration, N is the current time (Now) and e is a sleeping error. If, in an i-th iteration, the sleep takes Ti - N + e, the i-th iteration ends at time Ti + e, instead of (the ideal) Ti. In the next (i+1)-th iteration, the current time (N) will be approximately equal to Ti + e, so the sleep duration will be computed as Ti+1 - N = Ti+1 - Ti - e and the error will be compensated. Again, the sleep in the (i+1)-th iteration may not be exact and this will be corrected in the (i+2)-th iteration, and so on. This way, the average duration of one iteration should be close to the ideal time.
6.1.4 Write()
The class WavChannel is primarily intended to read data from an audio file. Because of that, the method Read() is more important than Write(). In fact, WavChannel::Write() will never be called in our application, because the method MyEndPoint::OpenAudioChannel() assigns a WavChannel instance to the thread responsible for outgoing audio. We however do implement WavChannel::Write() (file wavchan.cxx, lines 82–88). It is a good place to demonstrate the few steps that are required for each channel's Read() or Write() method.
Our WavChannel::Write() will simply ignores any data passed to it, but it pretends that the data buffer has been successfully written. This behaviour is in fact the same as that of /dev/null in Unix. The length of the buffer (PINDEX len, the second parameter of Write() ) is given in bytes. We first (file wavchan.cxx, line 85) set the channel member variable lastWriteCount to the number of (always successfully) written bytes. After that, we invoke an adaptive sleep by calling
writeDelay.Delay(len/2/8);
The object writeDelay is an instance of class PAdaptiveDelay (see 6.1.3 above). The method Delay() expects to receive the duration of the sleep in milliseconds. To obtain the number of milliseconds, we simply divide len (which is the length of the buffer in bytes) by 2 because each sample occupies 2 bytes (16 bits) and then by 8 because there are exactly 8 samples in one millisecond (the sampling rate is 8000 Hz).
The last step inside Write() is to return true to notify the caller that the entire buffer was processed successfully (again, see 6.1.3 above).
6.1.5 Read()
Let us now deal with WavChannel's method Read() (lines 93–117). Its task is to read audio data from a WAV file.
The code in line 95 through 102 ensures that the channel works well with early media start, when the logical channels are started before the called endpoint sends CONNECT. It might well happen that we would send the first few seconds of the WAV file and the other party would not hear them. To avoid this, we check (line 95) whether the H.323 connection is established and if it is not, we fill the buffer with silence (zero bytes) instead of the actual file data. We naturally have to do the required steps, i.e. set lastReadCount (line 99) and take care of proper timing (line 100). We return from the method with true in line 101 — the remaining part of the method will only be executed when the connection is established.
We read audio data from the file in line 104 and if the read fails, we return false immediately. If the file read operation is successful, we set channel's lastReadCount to the value obtained from wavFile's method LastReadCount() and then (line 108) call the adaptive sleep (lastReadCount/2/8 evaluates to the number of milliseconds that corresponds to the number of samples read from the file, see also 6.1.3 and 6.1.4).
Lines 110–114 take care of the situation when the read operation returns less data then was the required amount (i.e. len). We expect this to happen when we reach the end of the audio file. We want to hang up the H.323 call at this moment, so we call myConnection.ClearCall() in line 113. Note that ClearCall() does not do the whole connection clearing. It only initiates the end of the call and returns, so our Read() method will have time to execute till its end. The call clearing actions are performed in parallel by another thread.
The method ends in line 116 with a statement that returns true if at least one byte was read, conforming to the requirement in pwlib/include/ptlib/channel.h (see also 6.1.3).
6.2 NullChannel
The class NullChannel is intended to behave as /dev/null in Unix for writing and as /dev/zero for reading. The source of the class is quite simple and it reuses some code from WavChannel, so we will not describe it in detail.
NullChannel's method Write() is the same as WavChannel::Write(). It ignores all data passed to it, but reports them as successfully written — see 6.1.4 above. NullChannel's method Read() fills the buffer passed to it with silence (zero bytes). Both Read() and Write() use adaptive sleep — please refer to 6.1.3 through 6.1.5.