A common problem with MPLS L3-VPNs happens when the LSP between PE routers is broken which tends to drop traffic customers connected to those PEs. The reason is that VPN Labels (which is the lower label) are unknown to PE routers. This happens because IGPs like OSPF converges before LDP. So a path between PE is present but there's no labels exchanged yet, which makes P router pops the upper label and send the packet to it's neighbor P router which has no idea what to do with the lower label since it's not aware of VRF. which can cause black-hole in the network for sometime until LDP converges again.
A solution to this is using LDP-IGP synchronization. The idea is fairly simple, OSPF will not prefer the link to the neighbor router with the affected link until LDP has been fully converged over that link, to avoid packets with VPN labels from being forwarded to the unaware P routers. Of course this solution is used when you have redundant links or paths between P routers, other than that, OSPF will continue to forward the packets if its the only path.
Now let's see how this works in the topology below
From the topology above, R8,R9 and R12 are P Routers, while R6 and R13 are PE routers. Now, when R13 is trying to reach R6, OSPF will calculate the shortest path which is R13 - R12 - R6.
let's see the configuration of the vrf on R13
R13#show run vrf a
Building configuration...
Current configuration : 793 bytes
ip vrf a
rd 100:1
route-target export 100:1
route-target import 100:1
!
!
interface FastEthernet2/1
ip vrf forwarding a
ip address 10.1.13.13 255.255.255.0
duplex auto
speed auto
!
!
interface Loopback10
ip vrf forwarding a
ip address 130.130.130.130 255.255.255.255
!
!
router ospf 10 vrf a
log-adjacency-changes
redistribute bgp 1 subnets
network 10.1.13.13 0.0.0.0 area 0
network 130.130.130.130 0.0.0.0 area 0
!
!
router bgp 1
!
address-family ipv4 vrf a
no synchronization
network 130.130.130.130 mask 255.255.255.255
redistribute connected
exit-address-family
!
end
Now let's trace the path to R6 loopback in VRF a
R13#traceroute vrf a 66.66.66.66 source lo10
Type escape sequence to abort.
Tracing the route to 66.66.66.66
1 10.12.13.12 [MPLS: Labels 22/30 Exp 0] 28 msec 48 msec 48 msec
2 66.66.66.66 72 msec * 76 msec
Let's see what labels R13 are imposing on the packet going to R6
R13#show mpls forwarding-table vrf a 66.66.66.66 detail
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
None 30 66.66.66.66/32[V] \
0 Fa2/0 10.12.13.12
MAC/Encaps=14/22, MRU=1496, Label Stack{22 30}
CA0A585C0038CA073EC400388847 000160000001E000
VPN route: a
No output feature configured
We can see the label 22 which is used to forward the packet to R12 and label 30 which is used to identify the VPN.
The catch here is that MP-BGP exchanges updates through the loopback in the global routing table. looking at the forwarding table of R12 now to R6 global loopback IP address
R12#show mpls forwarding-table 6.6.6.6
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
22 Pop Label 6.6.6.6/32 23060 Fa2/1 10.6.12.6
We can see here that R12 is doing a PHP penultimate hop popping before forwarding the packet to R6. which leaves only label 30 that will be later used by R6 to identify which VPN.
Now lets simulate LDP faliure between R12 and R6, while keeping the OSPF neighborship up
R6(config)#int f2/1R6(config-if)#no mpls ip
R6(config-if)#
*Jun 24 00:31:00.663: %LDP-5-NBRCHG: LDP Neighbor 12.12.12.12:0 (3) is DOWN (LDP disabled on interface)
R12#show mpls forwarding-table 6.6.6.6
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
22 No Label 6.6.6.6/32 9598 Fa2/1 10.6.12.6
We can see now that R12 still prefers F2/1 as the outgoing interface since OSPF is used for the metric calculation, not LDP. Since there's no LDP neighborship on interface F2/1, R12 will send the packet with no label, and will try to recursevily look for ip 66.66.66.66 which will not be found in it's routing table since it's not VRF aware in the first place.
To confirm that, the ping from R13 to R6 should fail
R13#ping vrf a 66.66.66.66 source lo10 repe 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 66.66.66.66, timeout is 2 seconds:
Packet sent with a source address of 130.130.130.130
..........
After this long introduction to the problem, let's see how we can solve this issue using the LDP-IGP sync.
As mentioned before, LDP-IGP sync makes sure that traffic will now flow through the link that has no LDP neighborship but has an OSPF neighborship by 2 means. Either by sending the OSPF maximum metric to the downstream neighbor or by preventing the OSPF neighborship from coming up until LDP has converged.
let's see how this works by configuring LDP-IGP sync on R12
R12#show run | s router ospf 1
router ospf 1
mpls ldp sync
log-adjacency-changes
R12(config)#int f2/1
R12(config-if)#mpls ldp igp sync
now let's see how R12 interface F2/1 metric is maxed to 65535
R12#show ip ospf mpls ldp inter f2/1
FastEthernet2/1
Process ID 1, Area 0
LDP is not configured through LDP autoconfig
LDP-IGP Synchronization : Required
Holddown timer is not configured
Interface is up and sending maximum metric
now let's check what is the maximum metric from R12 side.
R12#show ip ospf database router 6.6.6.6
OSPF Router with ID (12.12.12.12) (Process ID 1)
Router Link States (Area 0)
LS age: 571
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 6.6.6.6
Advertising Router: 6.6.6.6
LS Seq Number: 8000000C
Checksum: 0x2E09
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 6.6.6.6
(Link Data) Network Mask: 255.255.255.255
Number of MTID metrics: 0
TOS 0 Metrics: 1
Link connected to: a Transit Network
(Link ID) Designated Router address: 10.6.8.8
(Link Data) Router Interface address: 10.6.8.6
Number of MTID metrics: 0
TOS 0 Metrics: 1
Link connected to: a Transit Network
(Link ID) Designated Router address: 10.6.12.12
(Link Data) Router Interface address: 10.6.12.6
Number of MTID metrics: 0
TOS 0 Metrics: 65535
R12 is now considering the link between itself and R6 unusable to forward packet since it's OSPF metric is the maximum metric. based on that, OSPF fall back to another path to reach R6.
Let's trace route again from R13 to see if it works this time.
R13#traceroute 6.6.6.6 probe 1
Type escape sequence to abort.
Tracing the route to 6.6.6.6
1 10.12.13.12 [MPLS: Label 22 Exp 0] 24 msec
2 10.8.9.8 [MPLS: Label 25 Exp 0] 56 msec
3 10.6.8.6 120 msec
The traffic path now is R13-R12-R8-R6, since LDP is running between all the involved links VPN traffic should pass normally as before but in the new path
R13#ping vrf a 66.66.66.66 source lo10 repe 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 66.66.66.66, timeout is 2 seconds:
Packet sent with a source address of 130.130.130.130
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 12/40/68 ms
Seems that everything is fine now. But what about simulating that R12 has just been rebooted, or the link between R12 and R6 was completely down and just came up. actually synchronization will behave a little differently but will give the same effect as illustrated before.
R12#show ip ospf mpls ldp interface f2/1
FastEthernet2/1
Process ID 1, Area 0
LDP is not configured through LDP autoconfig
LDP-IGP Synchronization : Required
Holddown timer is not configured
Interface is down and pending LDP
In the case of a reload or a link failure until the OSPF timers expire then coming up again. The Synchronization will prevent the OSPF neighborship between R12 and R6 from forming on interface FastEthernet2/1, Thus R13 will have to find another route to R6
R12# show ip ospf interface brief
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Lo0 1 0 12.12.12.12/32 1 LOOP 0/0
Fa2/1 1 0 10.6.12.12/24 1 DOWN 0/0
Fa2/0 1 0 10.12.13.12/24 1 BDR 1/1
Fa1/1 1 0 10.9.12.12/24 1 DR 1/1
Fa1/0 1 0 10.8.12.12/24 1 DR 1/1
Finally, how long should this state of unusable link continue like this? well, unless you configure a hold down timer to the LDP-IGP synchronization, the unusable state of the link will be permanent until LDP session is restored or the manually configured hold down timer expires.
The hold down timer can be configured from the global configuration mode.
R12(config)#mpls ldp igp sync holddown ?
<1-2147483647> Hold down time in milliseconds
R12(config)#mpls ldp igp sync holddown 10000
The LDP-IGP can be very useful during the period After the IGP has converged but Before LDP convergence to avoid L3-VPN traffic from dropping.