Just like the Public Switched Telephone Network (PSTN), real time media such as voice requires call setup and teardown as well as some form of management, which requires the use of signalling and management protocols. This article purely looks at the Internet protocols that were developed to ensure the reliable delivery of the media samples.
RTP (Real Time Transport Protocol) was specifically designed by the Internet Engineering Task Force (IETF) to carry real-time media samples across the network in a reliable fashion. It is often referred to as an Application Layer Framing Protocol and operates within the upper layers of the TCP/IP protocol suite, above the transport layer. Digitized real-time media samples are packetised by means of adding an RTP header which provides system identification and sequence numbers to ensure samples are delivered in the correct order.
RTP works in conjunction with another protocols known as Real Time Transport Control Protocol (RTCP) which was designed to provide some rudimentary quality of service. RTP itself provides an end-to-end delivery service for applications that require real-time support such as voice and video. This is achieved through the use of identification, sequence numbering, timestamping of data, data payload identification and finally by monitoring the delivery. The Sequence numbers allow the receiver of the media to reconstruct the sender’s packet sequence, while the timestamp allows the receiver to buffer the incoming data and output at a constant rate required by the receiving Codec (Coder Decoder). A jitter buffer is employed to assist in this, which attempts to smooth out any variations in delay between groups of media samples that are being carried in a sequence of IP (Internet Protocol) packets. Within the header there is a section that indicates the payload type by identifying the specific Codec used to digitize the original voice samples. Each user media stream is identified by means of a Synchronising Source identifier (SSRC) which is used to identify the source of the data.
RTCP is used to monitor the flow of data into the receiver and relay this information back to source endpoints in order to control the output flow and/or choice of Codec. As the maximum end-to-end delay for conversational voice and video is so short it is not practical to use a transport layer protocol that allows for retransmissions such as TCP (Transmission Control Protocol). For this reason UDP (User Datagram Protocol) is used to transport the RTP packets. RTP itself provides for the identification of lost and/or out-of-order packets and also provides the means to synchronise multiple audio and video streams. What it does not do, however, is to provide media quality information to participants in a media session. RTCP provides this service through the use of Sender and Receiver reports, and is a companion protocol to RTP.
There are a lot more issues related to the use of RTP and RTCP, such as the use of UDP Port numbers, Traversal of Firewalls and VoIP used with NAT (Network Address Translations). These issues will be covered in a further article.
VoIP and the use of RTP and RTCP are covered in detail in a number of our training courses including An Introduction to VoIP and VoIP with SIP.