Friday, March 28, 2014

LDP Session Protection

In a previous post, we discussed how we can prevent black holes after convergence that can happen in the network for lost LDP session when a node or a link failure happens using LDP – IGP Synchronization.
In this post, we will discuss another feature in MPLS networks that helps fast convergence after a link failure which is LDP Session protection.
In normal LDP operation, when a link between two adjacent neighbors fails, all the labels learned from a neighbor are flushed since each label determines the outgoing interface for that specific packet, What LDP session protection do is preserve that labels so that when the link comes back up it doesn't have to bind and rebuild all the prefixes learned from that neighbor again.
The session protection can be configured for directly connected neighbors and remote neighbors as well via LDP targeted hellos which is used to maintain the session using IGP to calculate the alternative routes to reach that neighbor. It’s also worth mentioning that by default, LSRs doesn't accept targeted Hellos. For them to accept it, there are several ways.
1.      Configure LDP session protection on directly connected neighbors, which automatically creates targeted sessions between connected neighbors.
2.      Manually specify the neighbor via the command “mpls ldp neighbor x.x.x.x targeted ldp” on both LSRs
3.      You can configure a LSR to passively accept a targeted hello via the command  “mpls ldp discovery targeted-hello accept” which can be accompanied by an access list for filtering
4.      ATOM Pseudo-wires and Traffic Engineering automatically creates a targeted session between the LSRs
Let’s check the below topology





All routers above are running OSPF as an IGP and LDP on all interface. Let’s see how R13 sees R12 as an LDP neighbor
R13#show mpls ldp neighbor 12.12.12.12
Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 12.12.12.12.646 - 13.13.13.13.45475
    State: Oper; Msgs sent/rcvd: 21/22; Downstream
    Up time: 00:01:14
    LDP discovery sources:
      FastEthernet2/0, Src IP addr: 10.12.13.12
    Addresses bound to peer LDP Ident:
      10.8.12.12   12.12.12.12 10.9.12.12   10.6.12.12
      10.12.13.12


I decided to take the prefix 6.6.6.6/32 as an example to clarify that LDP session protection is not only about the prefixes directly connected to a neighbor, rather than all binding for all prefixes received from that neighbor.

Everything looks good with R12, now let’s check the prefix 6.6.6.6/24 from R13 binding table
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 22
    local binding:  label: 23
    remote binding: lsr: 9.9.9.9:0, label: 22
    remote binding: lsr: 12.12.12.12:0, label: 26
The prefix 6.6.6.6/32 is known through two neighbors which are R12 and R9, R12 is preferred since the IGP metric is lower than the one through R9. Now let’s see what happens when the link between R12 and R13 fails. ( I enabled “debug mpls ldp binding first”)
*Mar 28 00:27:38.003: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 00:27:38.007: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa1/0, nh 10.9.13.9
*Mar 28 00:27:38.011: tib: add a route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label Unknown
*Mar 28 00:27:38.015: tib: update route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), with remote label 22 from 9.9.9.9:0
*Mar 28 00:27:38.019: tagcon: announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, inlabel 23, outlabel 22 (from 9.9.9.9:0), get path labels
*Mar 28 00:27:38.023: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0
*Mar 28 00:27:38.031: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
*Mar 28 00:27:38.035: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef
*Mar 28 00:27:39.003: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (2) is DOWN (TCP connection closed by peer)
~omitted some repetitive stuff~
*Mar 28 00:27:47.079: tagcon: tibent(6.6.6.6/32): label 26 from 12.12.12.12:0 removed
~omitted some other repetitive stuff~
*Mar 28 00:27:47.083: tagcon: (default) Deassign peer id; 12.12.12.12:0: id 1
It’s pretty obvious that now R13 just flushed all the labels from R12 and started replacing them with R9 as a next-hop. Let’s check the status of R12 again and the prefix 6.6.6.6/32
R13#show mpls ldp neighbor
Peer LDP Ident: 9.9.9.9:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 9.9.9.9.646 - 13.13.13.13.27049
    State: Oper; Msgs sent/rcvd: 100/94; Downstream
    Up time: 01:05:09
    LDP discovery sources:
         FastEthernet1/0, Src IP addr: 10.9.13.9
    Addresses bound to peer LDP Ident:
         10.8.9.9     9.9.9.9      10.9.10.9    10.9.12.9
         10.9.13.9
R12 isn’t now listed as a neighbor which is totally normal, and now 6.6.6.6/32 has one remote label binding from R9
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 22
    local binding:  label: 23
    remote binding: lsr: 9.9.9.9:0, label: 22
now let’s re-enable the link between R12 and R13 and see how the conversion goes
R13
*Mar 28 00:39:03.767: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (2) is UP
*Mar 28 00:39:03.959: tib: 6.6.6.6/32:: learn binding 26 from 12.12.12.12:0
*Mar 28 00:39:03.959: tib: a new binding to be added
*Mar 28 00:39:03.959: tagcon: tibent(6.6.6.6/32): label 26 from 12.12.12.12:0 added
*Mar 28 00:39:03.959: tib: next hop for route 6.6.6.6/32(0, 10.9.13.9, Fa1/0) is not mapped to peer 12.12.12.12:0
*Mar 28 00:39:03.959: tib: skip iprm label announcement for 6.6.6.6/32
*Mar 28 00:39:03.959: tib: 10.6.7.0/24:: learn binding 16 from 12.12.12.12:0
*Mar 28 00:39:03.959: tib: a new binding to be added
*Mar 28 00:39:22.043: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 00:39:22.047: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12
*Mar 28 00:39:22.047: tib: add a route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label Unknown
*Mar 28 00:39:22.051: tib: update route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), with remote label 26 from 12.12.12.12:0
*Mar 28 00:39:22.059: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels
*Mar 28 00:39:22.063: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label 22 from 9.9.9.9:0
*Mar 28 00:39:22.067: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
You can see that R13 learned a new binding from R12 for the prefix 6.6.6.6/32 and value for that binding is 26, this happens to all the prefixes that R12 know about and is sending to R13 when LDP is established.
Now let’s enable LDP session protection globally on R12 and R13
R13(config)#mpls ldp session protection
R13(config)#
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 -> 12.12.12.12 Req active by client, LDP SP
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 allocated
R13(config)#
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Set peer start; flags 0x10
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Defer peer cleanup; cleancnt 1
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Set peer finished; flags 0x1F
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 ref count incremented to 1
*Mar 28 01:01:45.447: tib: lsd populate for targeted neighbor
*Mar 28 01:01:45.471: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 01:01:45.471: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12
*Mar 28 01:01:45.475: tib: found route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0
*Mar 28 01:01:45.475: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels
The debug will show the population for all other prefixes, but since 6.6.6.6/32 is the only interesting prefix for us now, I omitted the rest. The “ldp-trgtnbr” at the beginning of each line indicated that the directly connected neighbors now have a LDP targeted session with R13. Let’s verify that
R13#show mpls ldp neighbor
Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 12.12.12.12.646 - 13.13.13.13.30831
    State: Oper; Msgs sent/rcvd: 37/37; Downstream
    Up time: 00:15:34
    LDP discovery sources:
      FastEthernet2/0, Src IP addr: 10.12.13.12
      Targeted Hello 13.13.13.13 -> 12.12.12.12, active, passive
    Addresses bound to peer LDP Ident:
         10.8.12.12   12.12.12.12 10.9.12.12   10.6.12.12
         10.12.13.12
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 56
    local binding:  label: 23
   remote binding: lsr: 9.9.9.9:0, label: 22
    remote binding: lsr: 12.12.12.12:0, label: 26
Clearly, there’s a targeted session between R13 and R12 indicating active/passive states, which means that R13 has initiated a targeted session from its side (active) and accepting a targeted session from R12 (passive).
At this moment, R13 has two label bindings for the prefix 6.6.6.6/32, let’s see what happens when the link between them goes down
R13#
*Mar 28 01:31:44.515: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 01:31:44.515: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa1/0, nh 10.9.13.9
*Mar 28 01:31:44.519: tib: add a route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label Unknown
*Mar 28 01:31:44.523: tib: update route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), with remote label 22 from 9.9.9.9:0
*Mar 28 01:31:44.527: tagcon: announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, inlabel 23, outlabel 22 (from 9.9.9.9:0), get path labels
*Mar 28 01:31:44.531: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0
*Mar 28 01:31:44.543: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
*Mar 28 01:31:44.543: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef
*Mar 28 01:31:53.815: ldp: Need tfib cleanup for peer FastEthernet2/0; 12.12.12.12:0
*Mar 28 01:31:53.819: tib: lsd cleanup for Fa2/0
*Mar 28 01:31:53.823: %LDP-5-SP: 12.12.12.12:0: session hold up initiated
After updating the labeled packets to be sent to R9 instead of the R12, R13 didn’t flush the labels like it did in the first time, but this time I generated a message stating that session holdup has been initiated.
R13#show mpls ldp neighbor
Peer LDP Ident: 9.9.9.9:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 9.9.9.9.646 - 13.13.13.13.13093
    State: Oper; Msgs sent/rcvd: 68/68; Downstream
    Up time: 00:42:53
    LDP discovery sources:
         FastEthernet1/0, Src IP addr: 10.9.13.9
    Addresses bound to peer LDP Ident:
         10.8.9.9     9.9.9.9      10.9.10.9    10.9.12.9
         10.9.13.9
Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 12.12.12.12.646 - 13.13.13.13.30831
    State: Oper; Msgs sent/rcvd: 71/78; Downstream
    Up time: 00:42:53
    LDP discovery sources:
         Targeted Hello 13.13.13.13 -> 12.12.12.12, active, passive
       Addresses bound to peer LDP Ident:
     10.8.12.12      12.12.12.12 10.9.12.12   10.6.12.12
This time, even though the directly connected interface is down, the targeted session is still up, but notice that the physical interface has disappeared from the discovery sources
Let’s check if the prefix 6.6.6.6/32is still there
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 56
    local binding:  label: 23
    remote binding: lsr: 9.9.9.9:0, label: 22
    remote binding: lsr: 12.12.12.12:0, label: 26
the prefix 6.6.6.6/32 is still in the binding table with the same label 26. Now there’s something to note here. By default, the prefix will be held indefinitely either until the directly connected interface comes back up or if the targeted session went down. This behavior can be changed in case you don’t want the router to keep this information forever, maybe because you changed the topology physically but the two routers are still reachable to each other, which doesn’t make any sense to make them maintain the session.
Here’s how to change that behavior
R13(config)#mpls ldp session protection duration ?
 <30-2147483>  Holdup time in seconds
 infinite   Protect session forever after loss of link discovery

Now for the good stuff, let’s get the failed link back on and see what happens

R13#
*Mar 28 01:46:09.891: tib: lsd populate for Fa2/0
*Mar 28 01:46:09.895: %LDP-5-SP: 12.12.12.12:0: session recovery succeeded
*Mar 28 01:46:09.983: tagcon: omit announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, from 12.12.12.12:0: Handle peer addr 10.12.13.12
*Mar 28 01:46:18.435: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 01:46:18.439: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12
*Mar 28 01:46:18.439: tib: add a route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label Unknown
*Mar 28 01:46:18.443: tib: update route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), with remote label 26 from 12.12.12.12:0
*Mar 28 01:46:18.451: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels
*Mar 28 01:46:18.451: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label 22 from 9.9.9.9:0
*Mar 28 01:46:18.463: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
*Mar 28 01:46:18.467: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef
Compared to the last recovery from the link failure without LDP session protection, R13 didn’t receive a new binding for the prefix 6.6.6.6/32, instead, it went directly to update the next hop for that prefix which is R12.

Friday, March 21, 2014

BGP Backdoor


BGP Backdoor is a feature in BGP that is used to optimize/overide the routing table when the default routing table route installation process isn’t in fact the best way to reach a certain network. Let’s first revise the route installation process before we get any deeper.

The main function of the routing table is to calculate the best route to a specific network.  Routing updates can come through different protocols. Each protocol has an Administrative Distance (AD)

For example, Cisco’s IOS uses the following Administrative Distances to sort out protocol updates.


Directly Connected
0
Static
1
eBGP
20
EIGRP
90
OSPF
110
RIP
120
EIGRP External
170
iBGP
200


The lower AD, the more preferable the route to be installed in the routing table

The router takes a few steps before actually installing routes in the routing table in a step by step basis which is the following.

1- Longest Prefix Match 
2- Administrative Distance
3- Metric

To make it clearer, let’s look at this topology



Let’s say two companies decided to merge and both of them needs to exchange route, both of them used to communicate through a service provider before merging together.

Now after the merge, they installed a direct link between them running OSPF as an IGP. Let’s take the network 222.222.222.222/24 as an example for the problem we’re facing here.

R2 is advertising the network 222.222.222.222/24 via eBGP to R1 and via OSPF to R3, By turn R1 propagates 222.222.222.222/24 to R3 since it’s in a different AS than R2, now  R3 has to decide which route source is better,

From R1
AD 20
Link capacity 1.5 Mbs
2 Hops
From R2
AD 110
Link Capacity 100 Mbs
1 Hop

Clearly the route directly through R2 is better, but again R3 will prefer the link through R1 since the BGP AD is lower than OSPF AD

Let’s see what’s on R3

R3#show ip route

B    222.222.222.0/24 [20/0] via 10.0.13.1, 00:00:47
     2.0.0.0/32 is subnetted, 1 subnets
O       2.2.2.2 [110/11] via 10.0.23.2, 00:06:05, FastEthernet0/1
     3.0.0.0/32 is subnetted, 1 subnets
C       3.3.3.3 is directly connected, Loopback0
     10.0.0.0/24 is subnetted, 3 subnets
O       10.0.12.0 [110/74] via 10.0.23.2, 00:06:05, FastEthernet0/1
C       10.0.13.0 is directly connected, Serial0/0
C       10.0.23.0 is directly connected, FastEthernet0/1



R3#show ip bgp
BGP table version is 4, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 222.222.222.0    10.0.13.1                              0 1 65000 i

Now let’s fix that be enabling BGP backdoor on R3 ( I also enabled debug IP ROUTING)

R3(config)#router bgp 65001
R3(config-router)#network 222.222.222.0 mask 255.255.255.0 backdoor


*Mar  1 00:18:34.647: RT: del 222.222.222.0 via 10.0.13.1, bgp metric [20/0]
*Mar  1 00:18:34.647: RT: delete network route to 222.222.222.0
*Mar  1 00:18:34.651: RT: NET-RED 222.222.222.0/24
*Mar  1 00:18:34.675: RT: add 222.222.222.0/24 via 10.0.23.2, ospf metric [110/20]
*Mar  1 00:18:34.679: RT: NET-RED 222.222.222.0/24


Checking the routing table and BGP table

R3#show ip bgp
BGP table version is 7, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
r> 222.222.222.0    10.0.13.1                              0 1 65000 i

The Asterix in front of the prefix is now gone, and there’s the small letter “r” indicating a RIB failure, which essentially means that the routing table has a more preferred route other than the one received by BGP

Here’s the routing table of R3

R3#show ip route
O    222.222.222.0/24 [110/20] via 10.0.23.2, 00:04:45, FastEthernet0/1
     2.0.0.0/32 is subnetted, 1 subnets
O       2.2.2.2 [110/11] via 10.0.23.2, 00:20:11, FastEthernet0/1
     3.0.0.0/32 is subnetted, 1 subnets
C       3.3.3.3 is directly connected, Loopback0
     10.0.0.0/24 is subnetted, 3 subnets
O       10.0.12.0 [110/74] via 10.0.23.2, 00:20:11, FastEthernet0/1
C       10.0.13.0 is directly connected, Serial0/0
C       10.0.23.0 is directly connected, FastEthernet0/1



R3 is now using the “actual” better route to reach the 222.222.222.222/24 prefix in R2.