Unless a remedy is provided inherently by the protocol, a routing loop might easily occur in a mesh network (a network allowing multiple paths between destinations). A routing loop disallows some packets from being properly routed due to the incorrect routing information circulating in the network. The symptom of such a routing loop is counting to infinity (see Figure 3): while routing updates on an unreachable network are incorrectly replaced by the older routing information, the metric when passed from router to router gradually increases. Unless some limit is put onto the metric indicating that the network is unreachable (for IP RIP it is 16 hops), the routing loop will be infinite. However, this infinity determines the maximum diameter of the particular network, and the network administrator should carefully check whether this limit fits the network reality.
Starting point -- routers converged
Network 10.4.0.0 detected down by router C
Router B advertises its old information about 10.4.0.0
Incorrect routing information causes routing loops
Figure 3. Routing Loop Creation
Three modifications to the distance vector protocol have been developed in an attempt to reduce the chance of routing loops:
Split horizon -- Prevents loops between adjacent routers. Rule: Never advertise a route out of the interface through which you learned it!
Poison reverse -- Prevents larger loops. Rule: Once you learn of a route through an interface, advertise it as unreachable back through that same interface!
Holddown timer -- Prevents incorrect route information from entering routing tables. Rule: After a route is advertised as down, do not listen to routing updates on that route for a specific period of time!
Each of the above mechanisms may be used in combination with the others. Indeed, Cisco supports both split horizon and poison reverse (setting the metric to infinity or 16) in its IP RIP implementations.
Split horizon is a base technique used to reduce the chance of routing loops. Split horizon states that it is never useful to send information about a route back in the direction from which the information came and therefore routing information should not be sent back to the source from which it came. In fact, only the interfaces are considered for the direction, not the neighbors.
Note that this rule works well not only for routes learned via a distance vector routing protocol but also for routes installed in a routing table as directly connected networks. As they reside on the same network, the neighbors do not need any advertisements on a path to that shared network.
The split horizon rule helps prevent two-node (two-neighbor) routing loops and also improves performance by eliminating unnecessary updates.
Whereas split horizons should prevent routing loops between neighbor routers, poison reverse updates are intended to defeat larger routing loops. While the simple split horizon scheme omits routes learned from one neighbor in updates sent to that neighbor, split horizon with poison reverse includes such routes in updates, but sets their metrics to infinity.
Split horizon prevents loops between neighbors (tight loops) by not advertising the routes on the same interface from which they were learned.
Split horizon with poison reverse allows the routing protocol to advertise all routes out an interface, but those learned from earlier updates coming into that interface are marked with infinite distance metrics.
Poison reverse thus establishes a single direction through which routes can be reached via a particular interface. Such an interface should not be traversed in the opposite direction to reach a particular destination. Poison reverse ensures this single direction by blocking the other way (by poisoning it with a high cost, such as infinity in the case of RIP). Its effect is best seen in the following situation: once a router discovers it has lost contact with a neighboring router, it will immediately forward a routing update with the inoperable route metric set to infinity. Additionally, the router will broadcast the route, with an infinite metric, for several regular routing update periods to ensure that all other routers on the internetwork have received the information and gradually converge.
Cisco also deploys so-called route poisoning. This technique is used, upon learning about the unreachable destination, to advertise the information on the failed route by sending a route update with an infinite metric.
Poison reverse is usually used in conjunction with split horizon; thus the mechanisms work together to prevent routing loops (a potential danger with distance vector routing). Poison reverse is also used in conjunction with holddown timers.
Holddown is a process in which a router, after receiving destination unreachable information from a neighbor router, will not accept new routing information from that router for a specified period of time, to prevent regular update messages from inappropriately reinstating a route that has gone bad. It is used due to the possibility that a device that has yet to be informed of a network failure may send an invalid regular update message (indicating that a route that has just gone down is still good) to a device that has just been notified of a network failure. In this case, the latter device now contains (and potentially advertises) incorrect routing information. In other words, holddown means: let the rumors calm down and wait for the truth.
After learning that a route to a destination has failed, a router enters a holddown state while it waits a certain period of time (controlled by a holddown timer) before believing and accepting any other routing information about that destination. This helps prevent transient routing loops caused, for example, by unstable (flapping) routes.
Holddown operates as follows: once a route is marked as unreachable, the router starts the holddown timer instead of the garbage collection timer (discussed later in this Tutorial). The route in a holddown, however, is still used for packet forwarding. When a routing update is received for a route in holddown, the update is ignored. As a consequence, the network routers cannot converge on alternative paths until the holddown for the route expires on all relevant routers. On expiration of the holddown timer, the route goes into garbage collection (unless an update for that route arrives).
A holddown timer tells routers to hold down any changes that might affect routes recently advertised as unreachable for some period of time. The holddown period is usually calculated to be just greater than the period of time necessary to update the entire network with a routing change. Holddown prevents the counting-to-infinity problem (gradually increasing metric due to ping-pong of routing updates between neighboring routers pointing to one another for a route). An additional benefit of holddown is that it prevents a situation where routers begin thrashing, attempting to converge. This is a common occurrence where a link is flapping from operable to inoperable and back in a short period of time.
Holddown timers help in handling new routing updates for recently announced unreachable networks (marked as such in the routing table) in the following way:
If an update arrives from a different neighboring router with a better metric than originally recorded for the network (before it became unreachable), the router removes the network from unreachable state, uses the new metric for the route, and stops the holddown timer.
If an update is received from other than the originating neighbor with a poorer metric, it is ignored (this could be the routing information looped in the internetwork before all routers converge as shown in Figure 3 above).
While holddown helps inhibit the formation of routing loops, it may have an adverse impact on the convergence. Due to this side effect, holddown is not used commonly in all distance vector routing protocols: however, Cisco's implementation of IP RIP does use it.