SwitchPacket: March 2014

In a previous post, we discussed how we can prevent black holes after convergence that can happen in the network for lost LDP session when a node or a link failure happens using LDP – IGP Synchronization.

In this post, we will discuss another feature in MPLS networks that helps fast convergence after a link failure which is LDP Session protection.

In normal LDP operation, when a link between two adjacent neighbors fails, all the labels learned from a neighbor are flushed since each label determines the outgoing interface for that specific packet, What LDP session protection do is preserve that labels so that when the link comes back up it doesn't have to bind and rebuild all the prefixes learned from that neighbor again.

The session protection can be configured for directly connected neighbors and remote neighbors as well via LDP targeted hellos which is used to maintain the session using IGP to calculate the alternative routes to reach that neighbor. It’s also worth mentioning that by default, LSRs doesn't accept targeted Hellos. For them to accept it, there are several ways.

1. Configure LDP session protection on directly connected neighbors, which automatically creates targeted sessions between connected neighbors.

2. Manually specify the neighbor via the command “mpls ldp neighbor x.x.x.x targeted ldp” on both LSRs

3. You can configure a LSR to passively accept a targeted hello via the command “mpls ldp discovery targeted-hello accept” which can be accompanied by an access list for filtering

4. ATOM Pseudo-wires and Traffic Engineering automatically creates a targeted session between the LSRs

Let’s check the below topology

All routers above are running OSPF as an IGP and LDP on all interface. Let’s see how R13 sees R12 as an LDP neighbor

R13#show mpls ldp neighbor 12.12.12.12

Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0

TCP connection: 12.12.12.12.646 - 13.13.13.13.45475

State: Oper; Msgs sent/rcvd: 21/22; Downstream

Up time: 00:01:14

LDP discovery sources:

FastEthernet2/0, Src IP addr: 10.12.13.12

Addresses bound to peer LDP Ident:

10.8.12.12 12.12.12.12 10.9.12.12 10.6.12.12

10.12.13.12

I decided to take the prefix 6.6.6.6/32 as an example to clarify that LDP session protection is not only about the prefixes directly connected to a neighbor, rather than all binding for all prefixes received from that neighbor.

Everything looks good with R12, now let’s check the prefix 6.6.6.6/24 from R13 binding table

R13#show mpls ldp bindings 6.6.6.6 32

lib entry: 6.6.6.6/32, rev 22

local binding: label: 23

remote binding: lsr: 9.9.9.9:0, label: 22

remote binding: lsr: 12.12.12.12:0, label: 26

The prefix 6.6.6.6/32 is known through two neighbors which are R12 and R9, R12 is preferred since the IGP metric is lower than the one through R9. Now let’s see what happens when the link between R12 and R13 fails. ( I enabled “debug mpls ldp binding first”)

*Mar 28 00:27:38.003: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0

*Mar 28 00:27:38.007: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa1/0, nh 10.9.13.9

*Mar 28 00:27:38.011: tib: add a route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label Unknown

*Mar 28 00:27:38.015: tib: update route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), with remote label 22 from 9.9.9.9:0

*Mar 28 00:27:38.019: tagcon: announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, inlabel 23, outlabel 22 (from 9.9.9.9:0), get path labels

*Mar 28 00:27:38.023: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0

*Mar 28 00:27:38.031: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef

*Mar 28 00:27:38.035: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef

*Mar 28 00:27:39.003: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (2) is DOWN (TCP connection closed by peer)

~omitted some repetitive stuff~

*Mar 28 00:27:47.079: tagcon: tibent(6.6.6.6/32): label 26 from 12.12.12.12:0 removed

~omitted some other repetitive stuff~

*Mar 28 00:27:47.083: tagcon: (default) Deassign peer id; 12.12.12.12:0: id 1

It’s pretty obvious that now R13 just flushed all the labels from R12 and started replacing them with R9 as a next-hop. Let’s check the status of R12 again and the prefix 6.6.6.6/32

R13#show mpls ldp neighbor

Peer LDP Ident: 9.9.9.9:0; Local LDP Ident 13.13.13.13:0

TCP connection: 9.9.9.9.646 - 13.13.13.13.27049

State: Oper; Msgs sent/rcvd: 100/94; Downstream

Up time: 01:05:09

LDP discovery sources:

FastEthernet1/0, Src IP addr: 10.9.13.9

Addresses bound to peer LDP Ident:

10.8.9.9 9.9.9.9 10.9.10.9 10.9.12.9

10.9.13.9

R12 isn’t now listed as a neighbor which is totally normal, and now 6.6.6.6/32 has one remote label binding from R9

R13#show mpls ldp bindings 6.6.6.6 32

lib entry: 6.6.6.6/32, rev 22

local binding: label: 23

remote binding: lsr: 9.9.9.9:0, label: 22

now let’s re-enable the link between R12 and R13 and see how the conversion goes

R13

*Mar 28 00:39:03.767: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (2) is UP

*Mar 28 00:39:03.959: tib: 6.6.6.6/32:: learn binding 26 from 12.12.12.12:0

*Mar 28 00:39:03.959: tib: a new binding to be added

*Mar 28 00:39:03.959: tagcon: tibent(6.6.6.6/32): label 26 from 12.12.12.12:0 added

*Mar 28 00:39:03.959: tib: next hop for route 6.6.6.6/32(0, 10.9.13.9, Fa1/0) is not mapped to peer 12.12.12.12:0

*Mar 28 00:39:03.959: tib: skip iprm label announcement for 6.6.6.6/32

*Mar 28 00:39:03.959: tib: 10.6.7.0/24:: learn binding 16 from 12.12.12.12:0

*Mar 28 00:39:03.959: tib: a new binding to be added

*Mar 28 00:39:22.043: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0

*Mar 28 00:39:22.047: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12

*Mar 28 00:39:22.047: tib: add a route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label Unknown

*Mar 28 00:39:22.051: tib: update route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), with remote label 26 from 12.12.12.12:0

*Mar 28 00:39:22.059: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels

*Mar 28 00:39:22.063: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label 22 from 9.9.9.9:0

*Mar 28 00:39:22.067: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef

You can see that R13 learned a new binding from R12 for the prefix 6.6.6.6/32 and value for that binding is 26, this happens to all the prefixes that R12 know about and is sending to R13 when LDP is established.

Now let’s enable LDP session protection globally on R12 and R13

R13(config)#mpls ldp session protection

R13(config)#

*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 -> 12.12.12.12 Req active by client, LDP SP

*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 allocated

R13(config)#

*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Set peer start; flags 0x10

*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Defer peer cleanup; cleancnt 1

*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Set peer finished; flags 0x1F

*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 ref count incremented to 1

*Mar 28 01:01:45.447: tib: lsd populate for targeted neighbor

*Mar 28 01:01:45.471: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0

*Mar 28 01:01:45.471: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12

*Mar 28 01:01:45.475: tib: found route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0

*Mar 28 01:01:45.475: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels

The debug will show the population for all other prefixes, but since 6.6.6.6/32 is the only interesting prefix for us now, I omitted the rest. The “ldp-trgtnbr” at the beginning of each line indicated that the directly connected neighbors now have a LDP targeted session with R13. Let’s verify that

R13#show mpls ldp neighbor

Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0

TCP connection: 12.12.12.12.646 - 13.13.13.13.30831

State: Oper; Msgs sent/rcvd: 37/37; Downstream

Up time: 00:15:34

LDP discovery sources:

FastEthernet2/0, Src IP addr: 10.12.13.12

Targeted Hello 13.13.13.13 -> 12.12.12.12, active, passive

Addresses bound to peer LDP Ident:

10.8.12.12 12.12.12.12 10.9.12.12 10.6.12.12

10.12.13.12

R13#show mpls ldp bindings 6.6.6.6 32

lib entry: 6.6.6.6/32, rev 56

local binding: label: 23

remote binding: lsr: 9.9.9.9:0, label: 22

remote binding: lsr: 12.12.12.12:0, label: 26

Clearly, there’s a targeted session between R13 and R12 indicating active/passive states, which means that R13 has initiated a targeted session from its side (active) and accepting a targeted session from R12 (passive).

At this moment, R13 has two label bindings for the prefix 6.6.6.6/32, let’s see what happens when the link between them goes down

R13#

*Mar 28 01:31:44.515: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0

*Mar 28 01:31:44.515: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa1/0, nh 10.9.13.9

*Mar 28 01:31:44.519: tib: add a route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label Unknown

*Mar 28 01:31:44.523: tib: update route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), with remote label 22 from 9.9.9.9:0

*Mar 28 01:31:44.527: tagcon: announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, inlabel 23, outlabel 22 (from 9.9.9.9:0), get path labels

*Mar 28 01:31:44.531: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0

*Mar 28 01:31:44.543: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef

*Mar 28 01:31:44.543: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef

*Mar 28 01:31:53.815: ldp: Need tfib cleanup for peer FastEthernet2/0; 12.12.12.12:0

*Mar 28 01:31:53.819: tib: lsd cleanup for Fa2/0

*Mar 28 01:31:53.823: %LDP-5-SP: 12.12.12.12:0: session hold up initiated

After updating the labeled packets to be sent to R9 instead of the R12, R13 didn’t flush the labels like it did in the first time, but this time I generated a message stating that session holdup has been initiated.

R13#show mpls ldp neighbor

Peer LDP Ident: 9.9.9.9:0; Local LDP Ident 13.13.13.13:0

TCP connection: 9.9.9.9.646 - 13.13.13.13.13093

State: Oper; Msgs sent/rcvd: 68/68; Downstream

Up time: 00:42:53

LDP discovery sources:

FastEthernet1/0, Src IP addr: 10.9.13.9

Addresses bound to peer LDP Ident:

10.8.9.9 9.9.9.9 10.9.10.9 10.9.12.9

10.9.13.9

Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0

TCP connection: 12.12.12.12.646 - 13.13.13.13.30831

State: Oper; Msgs sent/rcvd: 71/78; Downstream

Up time: 00:42:53

LDP discovery sources:

Targeted Hello 13.13.13.13 -> 12.12.12.12, active, passive

Addresses bound to peer LDP Ident:

10.8.12.12 12.12.12.12 10.9.12.12 10.6.12.12

This time, even though the directly connected interface is down, the targeted session is still up, but notice that the physical interface has disappeared from the discovery sources

Let’s check if the prefix 6.6.6.6/32is still there

R13#show mpls ldp bindings 6.6.6.6 32

lib entry: 6.6.6.6/32, rev 56

local binding: label: 23

remote binding: lsr: 9.9.9.9:0, label: 22

remote binding: lsr: 12.12.12.12:0, label: 26

the prefix 6.6.6.6/32 is still in the binding table with the same label 26. Now there’s something to note here. By default, the prefix will be held indefinitely either until the directly connected interface comes back up or if the targeted session went down. This behavior can be changed in case you don’t want the router to keep this information forever, maybe because you changed the topology physically but the two routers are still reachable to each other, which doesn’t make any sense to make them maintain the session.

Here’s how to change that behavior

R13(config)#mpls ldp session protection duration ?

<30-2147483> Holdup time in seconds

infinite Protect session forever after loss of link discovery

Now for the good stuff, let’s get the failed link back on and see what happens

R13#

*Mar 28 01:46:09.891: tib: lsd populate for Fa2/0

*Mar 28 01:46:09.895: %LDP-5-SP: 12.12.12.12:0: session recovery succeeded

*Mar 28 01:46:09.983: tagcon: omit announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, from 12.12.12.12:0: Handle peer addr 10.12.13.12

*Mar 28 01:46:18.435: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0

*Mar 28 01:46:18.439: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12

*Mar 28 01:46:18.439: tib: add a route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label Unknown

*Mar 28 01:46:18.443: tib: update route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), with remote label 26 from 12.12.12.12:0

*Mar 28 01:46:18.451: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels

*Mar 28 01:46:18.451: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label 22 from 9.9.9.9:0

*Mar 28 01:46:18.463: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef

*Mar 28 01:46:18.467: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef

Compared to the last recovery from the link failure without LDP session protection, R13 didn’t receive a new binding for the prefix 6.6.6.6/32, instead, it went directly to update the next hop for that prefix which is R12.

BGP Backdoor is a feature in BGP that is used to optimize/overide the routing table when the default routing table route installation process isn’t in fact the best way to reach a certain network. Let’s first revise the route installation process before we get any deeper.

The main function of the routing table is to calculate the best route to a specific network. Routing updates can come through different protocols. Each protocol has an Administrative Distance (AD)

For example, Cisco’s IOS uses the following Administrative Distances to sort out protocol updates.

Directly Connected	0
Static	1
eBGP	20
EIGRP	90
OSPF	110
RIP	120
EIGRP External	170
iBGP	200

The lower AD, the more preferable the route to be installed in the routing table

The router takes a few steps before actually installing routes in the routing table in a step by step basis which is the following.

1- Longest Prefix Match

2- Administrative Distance

3- Metric

To make it clearer, let’s look at this topology

Let’s say two companies decided to merge and both of them needs to exchange route, both of them used to communicate through a service provider before merging together.

Now after the merge, they installed a direct link between them running OSPF as an IGP. Let’s take the network 222.222.222.222/24 as an example for the problem we’re facing here.

R2 is advertising the network 222.222.222.222/24 via eBGP to R1 and via OSPF to R3, By turn R1 propagates 222.222.222.222/24 to R3 since it’s in a different AS than R2, now R3 has to decide which route source is better,

From R1	AD 20	Link capacity 1.5 Mbs	2 Hops
From R2	AD 110	Link Capacity 100 Mbs	1 Hop

Clearly the route directly through R2 is better, but again R3 will prefer the link through R1 since the BGP AD is lower than OSPF AD

Let’s see what’s on R3

R3#show ip route

B 222.222.222.0/24 [20/0] via 10.0.13.1, 00:00:47

2.0.0.0/32 is subnetted, 1 subnets

O 2.2.2.2 [110/11] via 10.0.23.2, 00:06:05, FastEthernet0/1

3.0.0.0/32 is subnetted, 1 subnets

C 3.3.3.3 is directly connected, Loopback0

10.0.0.0/24 is subnetted, 3 subnets

O 10.0.12.0 [110/74] via 10.0.23.2, 00:06:05, FastEthernet0/1

C 10.0.13.0 is directly connected, Serial0/0

C 10.0.23.0 is directly connected, FastEthernet0/1

R3#show ip bgp

BGP table version is 4, local router ID is 3.3.3.3

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

*> 222.222.222.0 10.0.13.1 0 1 65000 i

Now let’s fix that be enabling BGP backdoor on R3 ( I also enabled debug IP ROUTING)

R3(config)#router bgp 65001

R3(config-router)#network 222.222.222.0 mask 255.255.255.0 backdoor

*Mar 1 00:18:34.647: RT: del 222.222.222.0 via 10.0.13.1, bgp metric [20/0]

*Mar 1 00:18:34.647: RT: delete network route to 222.222.222.0

*Mar 1 00:18:34.651: RT: NET-RED 222.222.222.0/24

*Mar 1 00:18:34.675: RT: add 222.222.222.0/24 via 10.0.23.2, ospf metric [110/20]

*Mar 1 00:18:34.679: RT: NET-RED 222.222.222.0/24

Checking the routing table and BGP table

R3#show ip bgp

BGP table version is 7, local router ID is 3.3.3.3

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

r> 222.222.222.0 10.0.13.1 0 1 65000 i

The Asterix in front of the prefix is now gone, and there’s the small letter “r” indicating a RIB failure, which essentially means that the routing table has a more preferred route other than the one received by BGP

Here’s the routing table of R3

R3#show ip route

O 222.222.222.0/24 [110/20] via 10.0.23.2, 00:04:45, FastEthernet0/1

2.0.0.0/32 is subnetted, 1 subnets

O 2.2.2.2 [110/11] via 10.0.23.2, 00:20:11, FastEthernet0/1

3.0.0.0/32 is subnetted, 1 subnets

C 3.3.3.3 is directly connected, Loopback0

10.0.0.0/24 is subnetted, 3 subnets

O 10.0.12.0 [110/74] via 10.0.23.2, 00:20:11, FastEthernet0/1

C 10.0.13.0 is directly connected, Serial0/0

C 10.0.23.0 is directly connected, FastEthernet0/1

R3 is now using the “actual” better route to reach the 222.222.222.222/24 prefix in R2.

SwitchPacket

Pages

Friday, March 28, 2014

LDP Session Protection

Friday, March 21, 2014

BGP Backdoor