by Jason WydraApplying Troubleshooting Methods
In order to effectively troubleshoot IP telephony networks, you must first understand the existing LAN/WAN design. For example, if you attempt to troubleshoot a voice quality problem between two sites, and you don't understand the underlying QoS (Quality of Service) design, you're going to find it hard to isolate the cause. Furthermore, if you don't have a clear picture of the current infrastructure, you'll find it nearly impossible to pinpoint the root cause of a problem. Once you understand the design and infrastructure, you can start working on a detailed plan outlining your strategy. You should make only one configuration change at a time and then test the results. If the changes you made did not fix the problem, or made the problem worse, you should revert to the previous configuration and try another potential solution. Having a well thought out troubleshooting plan is essential.
Cisco Connection Online (CCO) should always be consulted when troubleshooting. You may be trying to isolate an issue with a router IOS that is caused by a known bug. If you don't rule out the possibility of a known software bug, you stand a chance of wasting many hours of unnecessary troubleshooting. Some problems can be fixed only by applying software upgrades. We will explore CCO tools later in this article, but first we need to look at common obstacles found in converged networks.
Today's converged networks consist of a mix of data, voice, and video traffic lumped together into a common infrastructure. Voice networks of the past utilized, and still utilize, a separate infrastructure. The infrastructure for voice traffic was isolated from the data network. Voice networks of the future will convert the old circuit-switched TDM voice networks into packetized voice over IP networks. This allows voice communications to share the same infrastructure as data networks.
Legacy voice networks use PBXs (private branch exchanges) and key systems as their main call processing engines. Each of these devices typically has access to a dedicated analog or digital circuit connected directly to the PSTN (Public Switched Telephone Network). Digital (non-IP) or analog phones connect to the PBX. When a call is placed from a phone, it is routed based on one of two scenarios:
If the called number resides on the PSTN, the call is routed out the local analog or digital circuit.
If the called number resides within the private network, the PBX routes the call locally and does not send it to the PSTN.
A company may have private PBX tie lines that connect offices together. Thus, site-to-site calls can be made across a private leased line instead of using the PSTN. These leased lines are commonly known as tie lines. Nowadays, with the convergence of voice with IP networks, it is possible to packetize digital voice samples and transport the voice stream across IP-enabled private networks or even the Internet. This bypasses the PSTN altogether. For instance, you can use an existing data circuit to make a call from site A to site B as opposed to routing the call across the PSTN. This is called toll bypass. By not routing the call to the PSTN, companies can realize tremendous savings on monthly toll charges. Using the existing data infrastructure can now avoid a long distance call between sites A and B.
You may be asking yourself, "What if my existing data network is already at peak utilization?" This is an excellent question and is probably one of the biggest obstacles to deploying a converged network. Too many IP telephony designs are installed without an adequate site survey to determine if the existing data network can handle the added load of voice traffic. Many times, VoIP is configured without regard for QoS. QoS is crucial when mixing data and voice traffic on a common infrastructure. Please refer to the following CertificationZone Study Guides for further information on QoS.
QoS I - By Howard Berkowitz
QoS II - By Howard Berkowitz
QoS III - By Jason Wydra
Another area of concern is CAC (Call Admission Control). You should limit calls across the WAN if you are limited on bandwidth and need to make sure that a certain amount of bandwidth is set aside for data. Another reason for using CAC is to avoid the problem of poor voice quality on VoIP calls. If too many calls are made that exceed bandwidth on a link, it may cause packets to be dropped randomly from all calls in progress. This affects the quality of all calls. You want to make sure that you use CAC as a mechanism to set a maximum number of simultaneous calls. Set the maximum limit based on the amount of bandwidth each call consumes and the total bandwidth available for Voice. CallManager uses location settings as a form of CAC in a single cluster environment. Multiple clusters require the use of a Gatekeeper. Gatekeepers will be discussed in a future CertificationZone tutorial. For more information about CAC using locations, see the CertificationZone Cisco IP Telephony Part 1 Study Guide.
Performing a baseline network analysis is crucial to a successful network design. This is especially important with today's converged data, voice, and video networks. Baselining network performance can provide the following:
crucial information on the health of hardware and software
determination of the current capacity of network resources
information to help set network alarm thresholds
identification of network performance degradation
prediction of future failures
Latency-sensitive applications such as VoIP use UDP (User Datagram Protocol) at the OSI transport layer. UDP does not provide for retransmissions like its counterpart, TCP (Transmission Control Protocol). UDP is much less forgiving of delay, jitter, and packet loss because it has no mechanism for retransmission of lost packets. For this reason, especially with VoIP, it is critical to establish a network baseline. This assures that the infrastructure is ready for the added load.
The importance of documentation cannot be overstated. Troubleshooting a network without detailed information about configurations and physical layouts of devices is a lost cause. This information in the form of Visio Diagrams, Excel spreadsheets, or Word documents should be readily available. Don't forget to document the passwords for your hardware. This is probably one of the most common mistakes. Problems occur when network administrators leave the company and they have stored the passwords in memory or somewhere that others don't know about.
Finally, be sure to keep your baseline and documentation in a secure area. Whether your documentation is located on a network file server or a physical binder, be sure that only those with a need to know have access to it.
What happens when you get a call in the middle of the night saying that the entire voice network is down? How do you deal with the IP phone users who refuse to adjust to the new technology and spend their days reporting minor problems that they didn't have with the legacy system? For all scenarios, like these examples, you must have a well thought out troubleshooting plan. The best approach to an issue is the bottom-up OSI approach. For review, the OSI layers are shown below.
Figure 1. OSI Reference Model Layers
End-to-end communications make use of the OSI model on each side of the connection. Think of two PCs communicating with each other. The communication starts at the Application layer of the OSI model. The transaction flows down the OSI model to the Physical layer. This is where the data frames are placed on the transmission media and sent to the other PC. Once the data makes it to the other end, the bits are sent back up from the Physical layer to the Application layer and displayed on the user's PC in a human usable form. The bottom-up method of troubleshooting refers to starting at the Physical layer of the OSI model. Many problems can be fixed without moving to another layer. This makes the Physical layer an excellent place to start. Check your physical connections. Make sure the devices have power. Sometimes equipment may be locked up and a simple reset solves the problem. Be careful before restarting equipment. In some cases, this can cause more problems than you'd expect. For example, for a router configuration that was not saved after changes were made, resetting the router will delete the configuration changes. If a problem cannot be fixed at the Physical layer, you must move up the OSI model and check for logical connectivity between the devices. Use tools such as traceroute and ping to verify end-to-end connectivity through the Network layer of the OSI model. The Transport layer of the OSI model is where the transition begins to take place between transactions that are coordinated at the media layer and transactions that are coordinated by the actual applications on the end device. At this point, you may need to troubleshoot software.
Using the OSI model as a guide to decide on a troubleshooting methodology establishes a structured approach to any problem. Many engineers like to jump to a certain layer right away. This may cause them to overlook a simple physical layer problem. Be very thorough in your troubleshooting steps. Never make more than one change at a time without testing the result. If the change you made did not correct the problem, revert to the previous configuration and try another potential solution.
For more information on the OSI model refer to CertificationZone tutorial OSI Reference Model, 2nd Edition
Cisco TAC's website provides a comprehensive collection of technical documentation and training material as well as an abundance of tools to help with troubleshooting Cisco networks. Some of the more common troubleshooting tools are explained below. Keep in mind that many of these tools require CCO access (a login and password).
Cisco Bug Toolkit - This tool can be used to search for software bugs related to Cisco IOS, CAT OS or specific platforms and applications.
Error Message Decoder - This tool allows you to paste an error message from Cisco Software. The decoder will respond with suggested action to resolve the error.
Output Interpreter - This tool allows you to paste information from a router show command. The engine will provide troubleshooting analysis and then return a suggested course of action.
TAC Case Collection - This tool allows you to search previously created TAC cases for common resolutions to frequently reported problems. The tool provides access to an extensive history of TAC cases.