Voice Over IP Telephony: Sizzle or Steak?

By Roy L. Morris

Adjunct Professor, Graduate Division

Capitol College, Laurel, Maryland*

 

[Note: Mr. Morris has recently completed teaching a graduate engineering course in telephony/data communications and its convergence at Capitol College, and he may be reached by e-mail at Roy_Morris@Alum.MIT.EDU]

 

In sales, there is an old adage, "sell the sizzle and not the steak." In the telephony business, unlike the data networking business, novel approaches to providing service have been few and far between over the last ten years or so. Thus it is with much skepticism that I sought to inquire about the significance of the new "voice over IP" telephone application in the common carrier arena.

Much has been made over the last year concerning the use of IP (Internet Protocol) as a vehicle for transporting voice, and mixing that voice traffic with data traffic. Entire businesses (e.g., Quest, Level 3, etc) have been created based on the use of IP as a differentiator between these firms and the incumbent firms. The question we seek to answer here is whether the use of IP as this transport vehicle is a substantive differentiator that actually represents an advance in the technology for providing voice services, or is it primarily a marketing differentiator? That is, is it the sizzle or is it the steak?

What is this IP Stuff, Anyway?

 

The initials "IP" stand for "Internet Protocol." Although the Internet (i.e., the "Internet" with a big "I") uses IP as its "network" protocol, the IP protocol is often used in private networks, as well. In order to understand what this statement means, one needs to understand how data networks work.

Data networks are made up of links. Links are simply physical or logical connections between nodes. Think of each link as one block of a city street, and nodes as intersections of those streets. Together, the links and nodes make up a network.

At each node (i.e., the places where the links intersect), there are traffic control systems that control how traffic may enter each link, and assures the integrity of traffic carried over that link. Ethernet and Frame Relay are two examples of traffic control systems that control the flow of data over any given link.

In addition to a traffic control system for each link, there is an overall traffic control system for the overall network. This overall network traffic control system is known as the "network protocol." The network protocol basically controls the flow of traffic among links, i.e., how and when traffic leaving one link will flow onto another in order to get that traffic to its ultimate destination. IP is just one type of network protocol. Another network protocol is the often-referred-to ATM or "Asynchronous Transport Mode."

IP telephony is not about Digital Transmission

 

The use of IP protocol has no bearing on whether the voice signal will be transmitted in a digital or analog form. Virtually all voice transmission within public carrier networks today are digital, as compared to analog transmission. Moreover, all such digital transmission is done using circuit switching, as compared with packet switching or more specifically, the IP protocol.

Digital transmission of voice signals simply refers to the transmission of voice signals in the form of their binary equivalent (i.e., "bits" or, "1"s and "0"s) as compared to a continuous time varying "analog" signal. Digital transmission has some advantages to analog transmission. Those advantages include digital’s relative immunity to noise, and reduced maintenance costs. But, as mentioned, these advantages are available in today’s circuit switched digital environment, just as they are with a packet switched digital environment such as IP.

For Predominantly Voice Communications, Packet Switching Is Not Preferred Over Circuit Switching

 

Without qualification, if one had to choose between a circuit switched network or packet switched network for carrying voice, circuit switched would be a better fit for the needs of voice communications. This has most to do with the fact that voice communications typically demands the continuous use of a voice grade channel. Packet switching is ideal for "bursty" communications. However, voice signals are not "bursty." In contrast, most data communications tend to be bursty.

Also, packet switching is generally prone to end-to-end transmission delays. However, voice communications is highly sensitive to delays or irregularities in the transmission of voice signal components. Even when transmitted in digital form, not only must the transmission delay of all voice samples be small (so as to be imperceptible to the listener), the time between the arrival of digital signal samples at the receiving end of a transmission must be at regular intervals. Any significant form of delay, whether delay of all samples or one sample relative to the other, can significantly degrade the quality of the voice signal recreated at the far end of the transmission.

Packet switching (of which IP is just one vintage) during network congestion is very much like traffic traveling through midtown Manhattan during rush hour; the "symptom" of too much traffic is significant delays in transit time. That is, packet congestion causes transmission delays. Either dropping packets or storing packets in memory locations throughout the network until the packets can get through can relieve congestion on a packet network. [This is what is done to certain traffic in congested frame relay networks]. Neither of these congestion solutions is compatible with voice communications that is comparable to circuit switched voice.

In contrast to a packet network which "slows down" when congested, circuit switched networks simply block new calls from entering the network to begin with. In contrast to a packet switched network, a congested circuit switched network does not degrade the quality of the voice transmissions for connections already established at the time the congestion occurs.

Given the intolerance of voice signals to delays, the relatively "delay-free" circuit switched network is better designed for accommodating the biggest problem facing voice communications – namely, delay.

If Circuit Switching is Better Than Packet Switching for Voice, then Why the Enthusiasm for Packet Switched Voice?

 

Packet switched voice primarily makes sense where there is a need or driving desire to place voice on a data network that primarily carries data traffic. It make little sense at this time for existing carriers who primarily carry voice to junk their circuit switched infrastructure (which is rather optimal for voice communications) and convert over to a packet switched infrastructure (which is rather optimal for the relatively delay insensitive world of data communications).

For most carriers today, voice communications remains the "killer application." Although voice communications continues to dominate the landscape today, this will ultimately change as data communications grows at 100% per year, while voice only increases by 8 to 10% per year. Thus, the dabbling of major carriers in the packet switched voice (or more specifically voice over IP world) only makes sense to the extent that these carrier wish to offer a data network service that their data customers want to use for voice communications as well. For example, a multi-location user that uses a commercial data network service to connect those customer locations might want to gain some economies of scope by using that same network for voice communications, as well. It has been estimated that companies can lower their communications costs by as much as 40% by placing their voice traffic through the unused space in their data networks for a "free ride."

ATM, not IP, Provides Data Network Characteristics That Make Voice On Data Consolidation Ultimately Possible

 

To make such as user-specific voice-on-data consolidation possible (such as that discussed in the previous section), the commercial data network service must be able to provide voice transmission quality (including minimal delay) for the voice traffic.

The IP protocol was neither designed to, nor is presently equipped to, offer a workable system for the duality of quality of service (i.e., minimal delay for voice, reasonable delay for data) that a data network needs in order for voice to be effectively placed on a data network. The original IP was designed to treat all traffic the same. Thus, congestion on an IP network equally delays all traffic, whether it is voice or data. While there are some quality of service features that are being made available for the IP protocol (e.g., RSVP, or Resource Reservation Protocol) these have not been widely adopted in inter-networking situations. One reason for this failure of RSVP acceptance is that back bone provides are reluctant to allow unknown third parties to control and prioritize the traffic through the backbone provider’s network.

ATM has become widely recognized as the only protocol that can accommodate a workable mixed voice and data environment. ATM has two very important features that make it the protocol of choice. First, ATM has a number of "classes of service," which allow voice traffic to be given priority over data traffic, when congestion occurs. Second, ATM uses relatively short fixed length packets, rather than variable length packets. This means that longer length packets cannot dominate the use of a transmission channel at the expense of short length packets, or vice a versa. As a result, qualities of service of the ATM protocol are enforced neutrally among all packets, based solely on their class of service, and not the length of the packets.

The only problem with ATM is that it is relatively expensive and not widely deployed at this time.

The Bottom Line

Packet switched voice makes sense when 1) most traffic is data and 2) the protocol used on the data network can effectively provide quality of service differentiation between voice and data such that during congestion situations voice services will not suffer delays. A careful examination of Sprint’s newly propose ION services indicates that the hybrid data/voice services provided over one transmission pipe to the ION customer will be built on an ATM fabric, such that voice services will be given the priorities that only ATM can offer at this time. The IP protocol does not (and could not) play any significant roles in the transmission infrastructure of such hybrid networks. Instead, to the extent voice over IP is offered, it is typically offered using a pseudo-quality of service for the IP by mapping voice and data IP sessions onto different ATM "virtual circuits," with the voice ATM virtual circuits having a higher priority than the data virtual circuits.

The bottom line is that "Voice Over IP" is actually just sizzle, while voice over ATM is really the steak and the steak is relatively expensive. Until these economics and technical factors change, circuit switched voice will (and should) dominate the landscape.