Programmer’s Guide
Interface Description/Programmer’s Guide
Details of WINRTP Interface
(Interface ICCNMediaTerm)
Level of Auto-Amplification of
incoming RTP audio streams
Type of Service Value (DiffServ
Byte) of Outgoing RTP Packets
Fixed Transmit Port for UDP Packets
Pre-emphasis of transmitted audio
Post-emphasis of received audio
This document describes the WinRTP for the Cisco IP SoftPhone from a programmer’s point of view. It discusses the COM interface it provides, installation, and configuration of the component.
The WinRTP (WINRTP) was developed as part of the Cisco IP SoftPhone product. Cisco IP SoftPhone is a PC based telephone integrated with AVVID, and works with the Cisco Call Manager. The primary focus of the WINRTP is to ensure that it works well with other products in AVVID including desktop IP Phones, gateways, etc.It can also be used as an independent component.
· It is written in C++
· It is a COM component. (Not an ActiveX control). This makes it easy to use using any programming language like C/C++/Java (using J/Direct) etc.
The basic job of the WINRTP is to source and sink audio streams to/from the network. So if an application needs the ability to do audio endpointing for real time voice (especially one that is integrated with Cisco’s AVVID), it can use this component.
Here are the improvements since the last version
WINRTP consists of two independent parts. One part has the ability to capture the user’s voice (using the system’s microphone), encode it, and send the voice as an RTP (Real Time Protocol) stream to a configurable destination. The other part listens for an RTP stream from the network, extracts the audio from it, and plays it using the PC’s speaker. When both parts are used together, then WINRTP acts like an IP based voice endpoint. Here are the features in short
The distribution comes with both source code and binaries. Extract the ZIP file to obtain everything. It will create a directory called WinRTP, which will contain everything.
Build Order
This will create TraceServer.dll and CCNSMT.dll. Build these projects either in Debug or Release mode.
WINRTP binary distribution consists of two DLL files under the WinRTP directory. The main COM DLL (CCNSMT.dll) that exposes the WINRTP interface is about 200Kbytes while the other (TraceServer.dll) DLL (used for tracing) is 28KBytes
None
All COM objects must be registered before WINRTP can be used. For this, use the regsvr32 program that comes with windows (it may be found in the system directory)
There is also a test program (both source code and binary) that is available. It is a simple program that does not exercise all the features. It just connects your default microphone to your default speaker for 5 seconds so that you can hear yourself, and then exits. The source code is in the “WinRTP/TestWinRTP” folder and the binary is WinRTP/TestWinRTP.exe
The WinRTP main DLL is CCNSMT.dll which exposes a COM interface that can be used to make calls to the WINRTP.
The interface of WINRTP (ICCNMediaTerm) consists of the following functions
WINRTP not only exposes a COM interface, it also has the ability to fire events to the component that using WINRTP. This is done through the standard Connection Point mechanism. For information on connection points read a book on COM and ATL (Active Template Library). The basic idea is that WINRTP describes a COM interface for receiving events. If a component implements that COM interface, then it can subscribe itself as a listener of events generated by the WINRTP.
The events interface (ICCNMediaTermEvents) is as follows
·
HRESULT EndOfFileEventRX(
[in] long Cookie
)
·
HRESULT EndOfFileEventTX(
[in] long Cookie
)
All methods in the interface return an HRESULT value. If the method succeeds, they return 0, otherwise a negative number for failure. The return values are changing L so the recommended way to debug any function failures is to use the trace mechanism (i.e. turn on tracing for the WINRTP, and look at the trace file which includes a description of the error that caused the negative return value. If problems persist, contact the developer of WINRTP for details/help.In some cases, some of the important return values may be discussed for a function, but not for all functions
Description:
Initializes the WINRTP. Instantiates all components. Also sets up default codecs using the following calls
This function must be called before any other calls to WINRTP
Parameters:
None
Description:
Uninitializes the WINRTP and releases all allocated resources. This must be the last call made to the WINRTP
Parameters:
None
Description:
No implemented/needed in this version
Parameters:
None
Description:
No implemented/needed in this version
Parameters:
None
Description:
No implemented/needed in this version
Parameters:
None
Description:
No implemented/needed in this version
Parameters:
None
Description:
Call this function to inform WINRTP of the audio codec used to encode the incoming RTP stream. This function may be called before StartRX is called. (so you may need to call StopRX before making this call). If called before StartRX is called, it sets the codec for the next invocation of StartRX. If it is called while receiving audio (i.e. after StartRX) it may return an error.
Parameters:
[in] long CompressionType: The
following values are supported
· 2 : G.711 Alaw 64kbps
· 4 : G.711 Ulaw 64kbps
[in] long MillisecPacketSize: Specifies the
length of audio in each incoming RTP audio packets
[in] long EchoCancellationValue: Ignored. Put
any value here. Echo cancellation is not supported in the WINRTP
[in] long G723BitRate: Ignored
Description:
Sets the audio codec for the transmit stream (outgoing stream). Should be called while NOT streaming (i.e. before StartTX/after StopTX)
Parameters:
[in] long CompressionType: See SetAudioCodecRX
[in] long MillisecPacketSize: See SetAudioCodecRX
[in] long PrecedenceValue: Ignored
[in] long SilenceSuppression: Specifies whether to do silence suppression
in the transmit stream
0 : Silence suppression is turned OFF
1 : Silence suppression is turned ON
[in] unsigned short MaxFramesPerPacket: Ignored
[in] long G723BitRate: See SetAudioCodecRX
Description:
Sets the destination [IP Address, UDP Port] where the send side audio stream should be transmitted. Must be called while not streaming (i.e. Before StartTX/after StopTX).
Parameters:
[in] BSTR strHostName: IP
address of the destination. E.g. “171.69.12.34”
[in] long nUDPortNumber: UDP port number where
to send the stream
Description:
Informs the WINRTP of the UDP port number where it should listen for the incoming RTP audio stream. Note: StartAudioReceive must be called before any audio from the incoming stream is played to the speaker.
Parameters:
Description:
This method should be used when a WAV file needs to be transmitted. The audio from the file is mixed in with the outgoing audio stream (user’s voice). The WINRTP fires an event to let the caller know when the file has finished playing, so that another file may be played. If the file finished playing, the WINRTP automatically calls StopPlayingFileTX so the caller need not call it. Only one file may be playing at a time. If this function is called while another file is playing already, an error is returned and the original file keeps playing. The function returns an unique identifier (cookie) that may be used in later calls related to this file play (to set the volume, or stop it from playing any more). This method can also play the file in a loop continuously without stopping. By default, files start playing at 25% volume.
Parameters:
[in] BSTR Filename: the
location (path) of the file to be played
[in] unsigned long Mode: specifies whether to
play the file once or in a loop
· 0 : play the file continuously in a loop
· 1 : play the file once
[in] unsigned long StartPosition: unimplemented/ignored
[in] unsigned long StopPosition: unimplemented/ignored
[in, out] long * Cookie: WINRTP
returns a unique ID for this instance of the file being played. This cookie
should be used in later calls pertaining to the instance of the file playing
Description:
This function starts mixing audio from the specified file to the received audio stream, so that the user hears audio from both the incoming audio stream and the file. The only difference is that we can have two files playing simultaneously in the receive side instead of one. By default, files start playing at 25% volume.
Parameters:
Exactly the same as StartPlayingFileTX, but with another extra parameter
[in] unsigned long waveoutDeviceID: specifies which speaker device to play the file to. WinRTP now allows you to play the file using the wave/speaker device opened for audio (with StartRX) or to another wave/speaker device. sometimes it may be useful to play a file locally to another audio device (for e.g. if you are using a USB headset for speech, you may want to play ring tones for incoming calls through the speakers connected to the sound card so that it is heard loudly). See StartRX for a discussion on waveoutDeviceID
Description:
Starts streaming on the transmit side. This method must be called before StartPlayingFileTX is called. Calling this method starts transmitting the user’s voice
Parameters:
unsigned long waveinDeviceID: specifies which audio device to use for audio capture/recording. device ID's are numbered 0...(#of recording audio devices-1), and -1 means use default audio device for windows. Check out waveInOpen() and waveInGetDevCaps() in the windows API. If you are confused, -1 actually means (unsigned long) –1.
Description:
Sets up WINRTP to start the receive side. It also starts playing the received audio to the speaker.
Parameters:
unsigned long waveoutDeviceID: specifies which audio device to use for playback/speaker. These device ID's are numbered 0 ... (# of playback devices - 1), and -1 means use the default playback device. Check out waveOutOpen() and waveOutGetDevCaps() functions in the windows API.. If you are confused, -1 actually means (unsigned long) –1.
Description:
Stops transmitting audio. Stops transmitting the user’s voice and files.
Parameters:
None
Description:
Stops receiving and playing audio. Stops playing the received audio stream and the files
Parameters:
None
Description:
Sets the speaker volume on the PC. This setting sets the WAVEOUT volume of the system (not the master volume).
Parameters:
[in] unsigned long volume:
value between 0 and 100 where 0 = silence, and 100 = max volume. The scale is
linear, so 50 = half volume
Description:
Sets the microphone volume. This setting changes the PC’s microphone volume or audio capture volume.
Parameters:
[in] unsigned long volume: value between 0 and 100 where 0 = silence, and 100 = max volume. The scale is linear, so 50 = half volume
Description:
Sets the volume of a file being played by the WINRTP.
Parameters:
[in] unsigned long cookie: the cookie that pertains to this instance of the file play. The cookie is obtained when StartPlayingFileTX(or RX) is called
[in] unsigned long volume: Volume setting. Starts from 0 (silence) to 100
(max volume)
Description:
Stops a file being played in the transmit side
Parameters:
[in] unsigned long Cookie: Cookie that was returned
when the file started playing.
Description:
Stops a file being played in the receive side
Parameters:
[in] unsigned long Cookie: Cookie that was returned when the file started playing.
The important GUIDs are
The following code snippet may be useful for more information
// CCNSMT.idl : IDL source for
CCNSMT.dll
//
// This file will be processed by the MIDL tool to
// produce the type library (CCNSMT.tlb) and marshalling code.
import "oaidl.idl";
import "ocidl.idl";
[
object,
uuid(94221C4D-00F1-11D4-9D59-0060B0FC246C),
helpstring("ICCNMediaTerm Interface"),
pointer_default(unique)
]
interface ICCNMediaTerm : IUnknown
{
[helpstring("method Initialize")]
HRESULT Initialize();
[helpstring("method UnInitialize")]
HRESULT UnInitialize();
[helpstring("method StartMicrophone")]
HRESULT StartMicrophone();
[helpstring("method StopMicrophone")]
HRESULT StopMicrophone();
[helpstring("method StartAudioReceive")]
HRESULT StartAudioReceive();
[helpstring("method StopAudioReceive")]
HRESULT StopAudioReceive();
[helpstring("method StopDtmfTone")]
HRESULT StopDtmfTone();
[helpstring("method SetAudioCodecRX")]
HRESULT SetAudioCodecRX([in] long CompressionType, [in] long
MillisecPacketSize, [in] long EchoCancellationValue, [in] long G723BitRate);
[helpstring("method SetAudioCodecTX")]
HRESULT SetAudioCodecTX([in] long CompressionType, [in] long
MillisecPacketSize, [in] long PrecedenceValue, [in] long SilenceSuppression,
[in] unsigned short MaxFramesPerPacket, [in] long G723BitRate);
[helpstring("method SetAudioDestination")]
HRESULT SetAudioDestination([in] BSTR strHostName, [in] long nUDPortNumber);
[helpstring("method SetAudioReceivePort")]
HRESULT SetAudioReceivePort([in] long nUDPPortNumber);
[helpstring("method StartDtmfTone")]
HRESULT StartDtmfTone([in] long cToneAsChar, [in] long OnTime, [in] long OffTime);
[helpstring("method StartPlayingFileTX")]
HRESULT StartPlayingFileTX([in] BSTR Filename, [in] unsigned long Mode, [in,
out] long * Cookie);
[helpstring("method StartPlayingFileRX")]
HRESULT StartPlayingFileRX([in] BSTR Filename, [in] unsigned long Mode, [in]
unsigned long waveoutDeviceID, [in, out] long * Cookie);
[helpstring("method StopPlayingFileTX")]
HRESULT StopPlayingFileTX([in] unsigned long Cookie);
[helpstring("method StopPlayingFileRX")]
HRESULT StopPlayingFileRX([in] unsigned long Cookie);
[helpstring("method StartTX")]
HRESULT StartTX([in] unsigned long waveinDeviceID);
[helpstring("method StopTX")]
HRESULT StopTX();
[helpstring("method StartRX")]
HRESULT StartRX([in] unsigned long waveoutDeviceID);
[helpstring("method StopRX")]
HRESULT StopRX();
[helpstring("method SetSpeakerVolume")]
HRESULT SetSpeakerVolume([in] unsigned long deviceID, [in] unsigned long
volume);
[helpstring("method SetMicrophoneVolume")]
HRESULT SetMicrophoneVolume([in] unsigned long deviceID, [in] unsigned long
volume);
[helpstring("method SetFilePlayVolume")]
HRESULT SetFilePlayVolume([in] unsigned long cookie, [in] unsigned long
volume);
[helpstring("method NetworkMonitor")]
HRESULT NetworkMonitor([in] unsigned long Enable, [in] unsigned long
DurationMillisec);
};
[
uuid(94221C4F-00F1-11D4-9D59-0060B0FC246C),
helpstring("_ICCNMediaTermEvents Interface")
]
interface _ICCNMediaTermEvents : IUnknown
{
[helpstring("method EndOfFileEventRX")]
HRESULT EndOfFileEventRX([in] long Cookie);
[helpstring("method EndOfFileEventTX")]
HRESULT EndOfFileEventTX([in] long Cookie);
[helpstring("method NetworkMonitorEventRX")]
HRESULT NetworkMonitorEventRX([in] double RXMean, [in] double RXVariance);
[helpstring("method NetworkMonitorEventTX")]
HRESULT NetworkMonitorEventTX([in] double TXMean, [in] double TXVariance);
};
[
uuid(94221C40-00F1-11D4-9D59-0060B0FC246C),
version(1.0),
helpstring("CCNSMT 1.0 Type Library")
]
library CCNSMTLib
{
importlib("stdole32.tlb");
importlib("stdole2.tlb");
[
uuid(94221C4E-00F1-11D4-9D59-0060B0FC246C),
helpstring("CCNMediaTerm Class")
]
coclass CCNMediaTerm
{
[default] interface ICCNMediaTerm;
[default, source] interface _ICCNMediaTermEvents;
};
};
[
uuid(94221C40-00F1-11D4-9D59-0060B0FC246C),
version(1.0),
helpstring("CCNSMT 1.0 Type Library")
]
library CCNSMTLib
{
importlib("stdole32.tlb");
importlib("stdole2.tlb");
[
uuid(94221C4E-00F1-11D4-9D59-0060B0FC246C),
helpstring("CCNMediaTerm Class")
]
coclass CCNMediaTerm
{
[default] interface ICCNMediaTerm;
[default, source] interface _ICCNMediaTermEvents;
};
};
Using the type library generated while compiling WinRTP (CCNSMT.tlb) one can easily use WinRTP in code. Visual C++ 6.0 allows importing a type library in the #import command, as the following sample code shows. Note that you cannot import WinRTP as a COM object into your project because it is NOT and ActiveX control nor does it support IDispatch.
#import
"../CCNMediaTerm/CCNSMT/CCNSMT.tlb" no_namespace, raw_interfaces_only
int main()
{
HRESULT hr;
// Initialize COM
hr = CoInitialize(NULL);
// Get Interface ICCNMediaTerm from the WinRTP COM Object using smart pointer
defined by the #import command above.
// Automatically calls IUnknown::AddRef();
ICCNMediaTermPtr pICCNMediaTerm(__uuidof(CCNMediaTerm));
// Initialize WinRTP. Must be the first call
pICCNMediaTerm->Initialize();
// Set parameters for receive side
pICCNMediaTerm->SetAudioCodecRX(4, 20, 0, 0);
pICCNMediaTerm->SetAudioReceivePort(8500);
// Set parameters for transmit side
pICCNMediaTerm->SetAudioCodecTX(4, 20, 0, 0, 0, 0);
pICCNMediaTerm->SetAudioDestination(L"127.0.0.1", 8500);
// Start reception side. we will use the default (-1) playback device
pICCNMediaTerm->StartRX(-1);
// Start transmit side. we will use the default (-1) recording device
pICCNMediaTerm->StartTX(-1);
// Set the speaker volume to 50%
pICCNMediaTerm->SetSpeakerVolume(-1, 50);
// Set the microphone volume to 50%
pICCNMediaTerm->SetMicrophoneVolume(-1, 50);
// Hear yourself for 5 seconds
Sleep(5000);
// Stop reception & transmission
pICCNMediaTerm->StopRX();
pICCNMediaTerm->StopTX();
// Unitialize WinRTP. Must be the last call
pICCNMediaTerm->UnInitialize();
// Let go of the reference to the ICCNMediaTerm interface. Automatically calls
IUnknown::Release()
pICCNMediaTerm = 0;
// Uninitialize COM
CoUninitialize();
return 0;
}
The configurable parameters of WINRTP are mostly set using the registry. The registry key for these settings is HKEY_CURRENT_USER\Software\Cisco Systems\CCNMediaTerm\1.0. If these entries do not exist in the registry, then the WINRTP creates them automatically with the default values the first time it needs to use them.
Set the UseDynamicJitterBuffer registry entry to “true” to use dynamic jitter buffer algorithm for audio reception. Set it to “false” to use static jitter buffer (like the old version of winrtp)
This value is relevant only if static jitter buffer is being used. The length of the jitter buffer can be specified using the JitterBufferTime registry setting. This setting is in milliseconds. The default value is 180, but lower values work on most computers. At the beginning of each talk spurt, the WINRTP fills x milliseconds of audio in the jitter buffer (where x is the value of the JitterBufferTime registry setting) before it starts playing it to the speaker. Higher jitter buffer length provides smoother audio and immunity to network problems, but increases the latency in a two-way conversation. But lowering this value too much can lead to bad quality audio (stuttering or jittery audio) in which case the user should try to increase this setting. The optimal value is very dependent on the configuration of the PC (sound card and drivers, operating system, etc.), so it should be set on a per-computer basis. The default value of 180 works on majority of computers (lower values may work too).
Try the following (Windows 2000/XP : 60ms, WinNT 4.0 : 120ms, Win 95/98/ME : 180ms
WinRTP can automatically amplify the volume of the incoming audio streams. It is needed because frequently the volume level in the incoming packets is pretty low, so they sound much faint compared to other sounds on the PC. We could increase the volume sliders on the PC, but that would make every other sound extremely loud so it is not a satisfactory solution. The real fix is to amplify the incoming audio to a level that is comparable to other system sounds. WinRTP 2.1 onwards has that ability. Note that increasing the level of the incoming signal can cause distortion (due to clipping) so WinRTP employs a novel technique that gives the user complete control. It has amplification that is governed by two parameters, a max-gain (the maximum gain that will be applied if possible), and a distortion-free-percentile (percentage of audio samples that should not be distorted due to this amplification). It is best explained with an example. Say max gain is 5, and distortion-free-percentile is 95%. Then given an RTP packet with audio samples, WinRTP will calculate how much gain it can apply so the samples so that 95% of the samples will NOT be distorted. If this value (say x) is less than max-gain, then the packet will be amplified x times. If x is more that max-gain, then the packet will be amplified max-gain times. To ensure that no distortion occurs, set the percentile to 100% and max-gain to a high value. That way, WinRTP will always amplify the packet without distorting any sample. However, during quiet periods, x might be large which will increase the loudness of the background noise. This is where setting max-gain to a optimal value helps, because it is the max amplification that will ever be applied. I have seen that max-gain values between 5-10 and percentile of 95% produces a good balance. To turn off this feature altogether, set max-gain to 1, so that no amplification will be done. The following registry keys control this feature
VolumeMaximizeMaxGain (floating point number >= 1.0 e.g. 8.0 )
VolumeMaximizePercentile (floating point number between 0.0 and 100.0 )
The WINRTP can stamp outgoing RTP packets with an IP TOS (type of service) value in the IP header. This is important for QoS purposes where packets of a certain TOS may be given priority in the network to reduce delay. To do this, you need to change the value in the RtpOut filter project (RtpOut.cpp)
If you want to use a particular local UDP port to transmit RTP streams, set the UseFixedTransmitPort to “true” and set the TransmitPort registry entry to the port number you want to use. Otherwise, set UseFixedTransmitPort to “false”. Note the receive and transmit port cannot be the same. Make sure transmit port != receive port, and transmit port != (receive port + 1)
To do pre-emphasis of transmitted audio to make it sound sharper, set the MicrophonePreprocess registry entry to “true” (“false” otherwise) and then set the TxFIRFilter registry entry to either “1” or “2”. This chooses between a set of parameters to set up an FIR filter to do pre-emphasis of the audio. Experiment to see which setting sounds best
To do post-emphasis of received audio to make it sound sharper, set the SpeakerPostprocess registry entry to “true” (“false” otherwise) and then set the RxFIRFilter registry entry to either “1” or “2”. This chooses between a set of parameters to set up an FIR filter to do post-emphasis of the audio. Experiment to see which setting sounds best
Sometimes the received audio may be too loud and you may want to do volume limiting to reduce the max volume. In that case, set the LimitVolume registry entry to “true” (“false” otherwise). This will turn on the volume limiting feature. To control the behavior of the limiter, there are three registry settings: LimiterThreshold (default –8.0), LimiterLossIncrement (default 0.075), LimiterLossDecrement (default –0.00075). Setting the threshold lower (for e.g. to –25.0 instead of –8.0) will limit audio to a lower volume. I recommend against playing around with the other parameters.
The following sections describes through an example how to use the Media Term component. Here are the basic steps
·
Initialize();
· Set the startup parameters for transmit: Use G.711 Ulaw Codec, 30ms packet size, No silence suppression. Transmit to localhost (127.0.0.1) to port 21243
o SetAudioCodecTX(4, 30, 0, 0, 1, 0);
o SetAudioDestination(“127.0.0.1”, 21243);
· Start transmission using the default audio capture device
o StartTX(-1);
· Mix the file “foo.wav” along with the transmitted stream. Play the file once.
o StartPlayingFileTX(“foo.wav”, 1, 0, 0, &sendFileCookie)
· Set the volume of the “foo.wav” file to 50%
o SetFilePlayVolume(sendFileCookie, 50)
· Change the codec from G.711 Ulaw to G.723 at 5.3kbps and turn on silence suppression
o SetAudioCodecTX(9, 30, 0, 1, 1, 0)
· Stop Transmitting (everything, including voice and files)
o StopTX()
· Set the startup parameters for receive: Use G.711 Ulaw Codec, 30ms packet size. Receiving from local port 8000
o SetAudioCodecRX(4, 30, 0, 0);
o SetAudioReceivePort(8000);
· Start receive side using the default audio playback device
o StartRX(-1);
· Mix the file “foo.wav” along with the received stream. Play the file continuously in a loop. Also play another file “foobar.wav” just once. Mix both files along with the received audio
o StartPlayingFileRX(“foo.wav”, 0, 0, 0, -1, &receiveFileCookie1);
o StartPlayingFileRX(“foobar.wav”, 1, 0, 0. –1, &receiveFileCookie2);
· Set the volume of the “foobar.wav” file to 25%
o SetFilePlayVolume(receiveFileCookie2, 25)
· Stop playing the “foo.wav” file that was playing in a loop
o StopPlayingFileRX(receiveFileCookie1);
· Change the codec from G.711 Ulaw to G.729 (30ms packet size), and also change the port to receive audio from 8000 to 9999
o SetAudioCodecRX(11, 30, 0, 0)
o SetAudioReceivePort(9999)
· Stop receiving (everything, including voice and files). This method releases the speaker
o StopRX()
I plan to release some sample C++ code to show how to use this component soon
Some of the future improvements that are being considered are