Friday, March 28, 2014

LDP Session Protection

In a previous post, we discussed how we can prevent black holes after convergence that can happen in the network for lost LDP session when a node or a link failure happens using LDP – IGP Synchronization.
In this post, we will discuss another feature in MPLS networks that helps fast convergence after a link failure which is LDP Session protection.
In normal LDP operation, when a link between two adjacent neighbors fails, all the labels learned from a neighbor are flushed since each label determines the outgoing interface for that specific packet, What LDP session protection do is preserve that labels so that when the link comes back up it doesn't have to bind and rebuild all the prefixes learned from that neighbor again.
The session protection can be configured for directly connected neighbors and remote neighbors as well via LDP targeted hellos which is used to maintain the session using IGP to calculate the alternative routes to reach that neighbor. It’s also worth mentioning that by default, LSRs doesn't accept targeted Hellos. For them to accept it, there are several ways.
1.      Configure LDP session protection on directly connected neighbors, which automatically creates targeted sessions between connected neighbors.
2.      Manually specify the neighbor via the command “mpls ldp neighbor x.x.x.x targeted ldp” on both LSRs
3.      You can configure a LSR to passively accept a targeted hello via the command  “mpls ldp discovery targeted-hello accept” which can be accompanied by an access list for filtering
4.      ATOM Pseudo-wires and Traffic Engineering automatically creates a targeted session between the LSRs
Let’s check the below topology





All routers above are running OSPF as an IGP and LDP on all interface. Let’s see how R13 sees R12 as an LDP neighbor
R13#show mpls ldp neighbor 12.12.12.12
Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 12.12.12.12.646 - 13.13.13.13.45475
    State: Oper; Msgs sent/rcvd: 21/22; Downstream
    Up time: 00:01:14
    LDP discovery sources:
      FastEthernet2/0, Src IP addr: 10.12.13.12
    Addresses bound to peer LDP Ident:
      10.8.12.12   12.12.12.12 10.9.12.12   10.6.12.12
      10.12.13.12


I decided to take the prefix 6.6.6.6/32 as an example to clarify that LDP session protection is not only about the prefixes directly connected to a neighbor, rather than all binding for all prefixes received from that neighbor.

Everything looks good with R12, now let’s check the prefix 6.6.6.6/24 from R13 binding table
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 22
    local binding:  label: 23
    remote binding: lsr: 9.9.9.9:0, label: 22
    remote binding: lsr: 12.12.12.12:0, label: 26
The prefix 6.6.6.6/32 is known through two neighbors which are R12 and R9, R12 is preferred since the IGP metric is lower than the one through R9. Now let’s see what happens when the link between R12 and R13 fails. ( I enabled “debug mpls ldp binding first”)
*Mar 28 00:27:38.003: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 00:27:38.007: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa1/0, nh 10.9.13.9
*Mar 28 00:27:38.011: tib: add a route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label Unknown
*Mar 28 00:27:38.015: tib: update route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), with remote label 22 from 9.9.9.9:0
*Mar 28 00:27:38.019: tagcon: announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, inlabel 23, outlabel 22 (from 9.9.9.9:0), get path labels
*Mar 28 00:27:38.023: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0
*Mar 28 00:27:38.031: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
*Mar 28 00:27:38.035: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef
*Mar 28 00:27:39.003: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (2) is DOWN (TCP connection closed by peer)
~omitted some repetitive stuff~
*Mar 28 00:27:47.079: tagcon: tibent(6.6.6.6/32): label 26 from 12.12.12.12:0 removed
~omitted some other repetitive stuff~
*Mar 28 00:27:47.083: tagcon: (default) Deassign peer id; 12.12.12.12:0: id 1
It’s pretty obvious that now R13 just flushed all the labels from R12 and started replacing them with R9 as a next-hop. Let’s check the status of R12 again and the prefix 6.6.6.6/32
R13#show mpls ldp neighbor
Peer LDP Ident: 9.9.9.9:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 9.9.9.9.646 - 13.13.13.13.27049
    State: Oper; Msgs sent/rcvd: 100/94; Downstream
    Up time: 01:05:09
    LDP discovery sources:
         FastEthernet1/0, Src IP addr: 10.9.13.9
    Addresses bound to peer LDP Ident:
         10.8.9.9     9.9.9.9      10.9.10.9    10.9.12.9
         10.9.13.9
R12 isn’t now listed as a neighbor which is totally normal, and now 6.6.6.6/32 has one remote label binding from R9
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 22
    local binding:  label: 23
    remote binding: lsr: 9.9.9.9:0, label: 22
now let’s re-enable the link between R12 and R13 and see how the conversion goes
R13
*Mar 28 00:39:03.767: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (2) is UP
*Mar 28 00:39:03.959: tib: 6.6.6.6/32:: learn binding 26 from 12.12.12.12:0
*Mar 28 00:39:03.959: tib: a new binding to be added
*Mar 28 00:39:03.959: tagcon: tibent(6.6.6.6/32): label 26 from 12.12.12.12:0 added
*Mar 28 00:39:03.959: tib: next hop for route 6.6.6.6/32(0, 10.9.13.9, Fa1/0) is not mapped to peer 12.12.12.12:0
*Mar 28 00:39:03.959: tib: skip iprm label announcement for 6.6.6.6/32
*Mar 28 00:39:03.959: tib: 10.6.7.0/24:: learn binding 16 from 12.12.12.12:0
*Mar 28 00:39:03.959: tib: a new binding to be added
*Mar 28 00:39:22.043: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 00:39:22.047: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12
*Mar 28 00:39:22.047: tib: add a route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label Unknown
*Mar 28 00:39:22.051: tib: update route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), with remote label 26 from 12.12.12.12:0
*Mar 28 00:39:22.059: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels
*Mar 28 00:39:22.063: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label 22 from 9.9.9.9:0
*Mar 28 00:39:22.067: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
You can see that R13 learned a new binding from R12 for the prefix 6.6.6.6/32 and value for that binding is 26, this happens to all the prefixes that R12 know about and is sending to R13 when LDP is established.
Now let’s enable LDP session protection globally on R12 and R13
R13(config)#mpls ldp session protection
R13(config)#
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 -> 12.12.12.12 Req active by client, LDP SP
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 allocated
R13(config)#
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Set peer start; flags 0x10
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Defer peer cleanup; cleancnt 1
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 Set peer finished; flags 0x1F
*Mar 28 01:01:45.331: ldp-trgtnbr: 12.12.12.12 ref count incremented to 1
*Mar 28 01:01:45.447: tib: lsd populate for targeted neighbor
*Mar 28 01:01:45.471: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 01:01:45.471: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12
*Mar 28 01:01:45.475: tib: found route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0
*Mar 28 01:01:45.475: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels
The debug will show the population for all other prefixes, but since 6.6.6.6/32 is the only interesting prefix for us now, I omitted the rest. The “ldp-trgtnbr” at the beginning of each line indicated that the directly connected neighbors now have a LDP targeted session with R13. Let’s verify that
R13#show mpls ldp neighbor
Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 12.12.12.12.646 - 13.13.13.13.30831
    State: Oper; Msgs sent/rcvd: 37/37; Downstream
    Up time: 00:15:34
    LDP discovery sources:
      FastEthernet2/0, Src IP addr: 10.12.13.12
      Targeted Hello 13.13.13.13 -> 12.12.12.12, active, passive
    Addresses bound to peer LDP Ident:
         10.8.12.12   12.12.12.12 10.9.12.12   10.6.12.12
         10.12.13.12
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 56
    local binding:  label: 23
   remote binding: lsr: 9.9.9.9:0, label: 22
    remote binding: lsr: 12.12.12.12:0, label: 26
Clearly, there’s a targeted session between R13 and R12 indicating active/passive states, which means that R13 has initiated a targeted session from its side (active) and accepting a targeted session from R12 (passive).
At this moment, R13 has two label bindings for the prefix 6.6.6.6/32, let’s see what happens when the link between them goes down
R13#
*Mar 28 01:31:44.515: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 01:31:44.515: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa1/0, nh 10.9.13.9
*Mar 28 01:31:44.519: tib: add a route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label Unknown
*Mar 28 01:31:44.523: tib: update route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), with remote label 22 from 9.9.9.9:0
*Mar 28 01:31:44.527: tagcon: announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, inlabel 23, outlabel 22 (from 9.9.9.9:0), get path labels
*Mar 28 01:31:44.531: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label 26 from 12.12.12.12:0
*Mar 28 01:31:44.543: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
*Mar 28 01:31:44.543: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef
*Mar 28 01:31:53.815: ldp: Need tfib cleanup for peer FastEthernet2/0; 12.12.12.12:0
*Mar 28 01:31:53.819: tib: lsd cleanup for Fa2/0
*Mar 28 01:31:53.823: %LDP-5-SP: 12.12.12.12:0: session hold up initiated
After updating the labeled packets to be sent to R9 instead of the R12, R13 didn’t flush the labels like it did in the first time, but this time I generated a message stating that session holdup has been initiated.
R13#show mpls ldp neighbor
Peer LDP Ident: 9.9.9.9:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 9.9.9.9.646 - 13.13.13.13.13093
    State: Oper; Msgs sent/rcvd: 68/68; Downstream
    Up time: 00:42:53
    LDP discovery sources:
         FastEthernet1/0, Src IP addr: 10.9.13.9
    Addresses bound to peer LDP Ident:
         10.8.9.9     9.9.9.9      10.9.10.9    10.9.12.9
         10.9.13.9
Peer LDP Ident: 12.12.12.12:0; Local LDP Ident 13.13.13.13:0
    TCP connection: 12.12.12.12.646 - 13.13.13.13.30831
    State: Oper; Msgs sent/rcvd: 71/78; Downstream
    Up time: 00:42:53
    LDP discovery sources:
         Targeted Hello 13.13.13.13 -> 12.12.12.12, active, passive
       Addresses bound to peer LDP Ident:
     10.8.12.12      12.12.12.12 10.9.12.12   10.6.12.12
This time, even though the directly connected interface is down, the targeted session is still up, but notice that the physical interface has disappeared from the discovery sources
Let’s check if the prefix 6.6.6.6/32is still there
R13#show mpls ldp bindings 6.6.6.6 32
 lib entry: 6.6.6.6/32, rev 56
    local binding:  label: 23
    remote binding: lsr: 9.9.9.9:0, label: 22
    remote binding: lsr: 12.12.12.12:0, label: 26
the prefix 6.6.6.6/32 is still in the binding table with the same label 26. Now there’s something to note here. By default, the prefix will be held indefinitely either until the directly connected interface comes back up or if the targeted session went down. This behavior can be changed in case you don’t want the router to keep this information forever, maybe because you changed the topology physically but the two routers are still reachable to each other, which doesn’t make any sense to make them maintain the session.
Here’s how to change that behavior
R13(config)#mpls ldp session protection duration ?
 <30-2147483>  Holdup time in seconds
 infinite   Protect session forever after loss of link discovery

Now for the good stuff, let’s get the failed link back on and see what happens

R13#
*Mar 28 01:46:09.891: tib: lsd populate for Fa2/0
*Mar 28 01:46:09.895: %LDP-5-SP: 12.12.12.12:0: session recovery succeeded
*Mar 28 01:46:09.983: tagcon: omit announce labels for: 6.6.6.6/32; nh 10.9.13.9, Fa1/0, from 12.12.12.12:0: Handle peer addr 10.12.13.12
*Mar 28 01:46:18.435: tib: prefix recurs walk start: 6.6.6.6/32, tableid: 0
*Mar 28 01:46:18.439: tib: get path labels: 6.6.6.6/32(0), nh tableid: 0, Fa2/0, nh 10.12.13.12
*Mar 28 01:46:18.439: tib: add a route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), remote label Unknown
*Mar 28 01:46:18.443: tib: update route info for 6.6.6.6/32(0, 10.12.13.12, Fa2/0), with remote label 26 from 12.12.12.12:0
*Mar 28 01:46:18.451: tagcon: announce labels for: 6.6.6.6/32; nh 10.12.13.12, Fa2/0, inlabel 23, outlabel 26 (from 12.12.12.12:0), get path labels
*Mar 28 01:46:18.451: tib: prefix walking remove route info for 6.6.6.6/32(0, 10.9.13.9, Fa1/0), remote label 22 from 9.9.9.9:0
*Mar 28 01:46:18.463: tagcon: rib change: 6.6.6.6/32; event 0x4; proctype 0x200; ndb attrflags 0x1000000; ndb->pdb_index 0x2/undef
*Mar 28 01:46:18.467: tagcon: rib change: 6.6.6.6/255.255.255.255; event 0x4; ndb attrflags 0x1000000; ndb pdb_index 0x2/undef
Compared to the last recovery from the link failure without LDP session protection, R13 didn’t receive a new binding for the prefix 6.6.6.6/32, instead, it went directly to update the next hop for that prefix which is R12.