Friday, June 27, 2014

OSPF Sham-Link

A common scenario is when a company has two sites and those sites are in the same OSPF area, whilst the two sites has two connections, the first one is through a service provider which is running MPLS and MP-BGP to connect the two sites together and the other connection is a direct connection between the two sites, it’s also called a Backdoor link.

By default, OSPF prefers the Intra-Area LSAs over the Inter-Area LSAs. This can cause traffic to prefer the backdoor link over the one through the provider MPLS because when OSPF is used as a PE-CE routing protocol, the PE router acts as an ABR that is connected to area 0 ( Super Backbone) between the two sites. Even though the two sites might be in the same area and not contiguous because of the super backbone, BGP can handle the information carried from one site to the other without the need for a virtual-link.

The solution to this is the sham-link which is a logical link that is treated as if it’s a normal OSPF link between two routers, which is included in the SPF algorithm calculations and OSPF topology table.

In the below topology, CE-A2 and CE-B2 are both connected to the ISP through E0/0 interfaces with a capacity of 10 Mb and to each other via a  backdoor link through E0/2 interface with a capacity of 5 Mb.





Assuming that OSPF has been fully converged and MP-BGP has propagated the routes between all sites. Let’s first check what CE-A2 sees in its routing table

CE-A2#show ip route

      20.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
O IA     20.2.2.2/32 [110/21] via 40.0.12.1, 00:01:45, Ethernet0/0
      40.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
C        40.0.12.0/24 is directly connected, Ethernet0/0
L        40.0.12.2/32 is directly connected, Ethernet0/0
O        40.0.22.0/24 [110/1020] via 50.0.0.2, 00:01:40, Ethernet0/1
C        40.2.2.2/32 is directly connected, Loopback0
O        40.3.3.3/32 [110/1001] via 50.0.0.2, 00:01:40, Ethernet0/1
      50.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        50.0.0.0/24 is directly connected, Ethernet0/1
L        50.0.0.1/32 is directly connected, Ethernet0/1


Also let’s Check how PE2 is handling those routes

PE2#show ip bgp vpnv4 vrf a 
BGP table version is 75, local router ID is 20.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found


     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 1:1 (default for vrf a)
 *>  20.2.2.2/32      0.0.0.0                  0         32768 ?
* i 40.0.12.0/24     20.3.3.3              1030    100      0 ?
 *>                   0.0.0.0                  0         32768 ?
 *>  40.0.22.0/24     40.0.12.2             1030         32768 ?
 * i                  20.3.3.3                 0    100      0 ?
 * i 40.2.2.2/32      20.3.3.3              1011    100      0 ?
 *>                   40.0.12.2               11         32768 ?
 *>  40.3.3.3/32      40.0.12.2             1011         32768 ?
 * i                  20.3.3.3                11    100      0 ?
 * i 50.0.0.0/24      20.3.3.3              1010    100      0 ?
 *>                   40.0.12.2             1010         32768 ?

You can also see why BGP preferred the route with a higher metric, since OSPF already preferred the Intra-Area Route with metric 1000. BGP extensions for OSPF as a PE-CE protocol tends to preserve the metrics of the best route found in the OSPF data base, that’s why the weight attribute was increased to the max to make sure that whatever the metric OSPF route has, it will be the one carried through BGP.

PE2#show ip bgp vpnv4 vrf a 40.3.3.3
BGP routing table entry for 1:1:40.3.3.3/32, version 97
Paths: (2 available, best #1, table a)
  Advertised to update-groups:
     1        
  Refresh Epoch 1
  Local
    40.0.12.2 from 0.0.0.0 (20.2.2.2)
      Origin incomplete, metric 1011, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x0000000A0200
        OSPF RT:0.0.0.1:2:0 OSPF ROUTER ID:20.2.2.2:0
      mpls labels in/out 19/nolabel
  Refresh Epoch 1
  Local
    20.3.3.3 (metric 20) from 20.3.3.3 (20.3.3.3)
      Origin incomplete, metric 11, localpref 100, valid, internal
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x0000000A0200
        OSPF RT:0.0.0.1:2:0 OSPF ROUTER ID:40.0.22.1:0
      mpls labels in/out 19/29

Even though the cost of the backdoor link is 1000, which is way more than the link to the MPLS core, it is still preferred since that LSAs received from the CE neighbor is an Intra-Area LSA.

The problem here is that the backdoor link can only handle very little traffic and it’s created in case the service provider suffered an outage or some sort of network failure, which makes it a backup link. So it doesn’t make any sense for it to handle all the traffic in normal network operation.

To solve this issue, let’s create a sham-link between the two PE routers. But before we do that there’s a caveat that has to be considered for the sham-link to be activated properly.

Sham-link identifier IPs must be advertised through MP-iBGP neighbors only and it must be a /32 IPs, the reason for that is OSPF has a lower administrative distance that iBGP, so if it is advertised through OSPF, PE router will prefer the route to the sham-link peering IP through OSPF which by turn will makes the sham-link goes down, then PE routers will again revert to the sham-links identifier IPs through BGP and as soon it comes up, it will prefer the OSPF which will result in a continuous flapping of the sham link. That’s why you should never advertise the sham-link through OSPF.

PE2#

interface Loopback20
 ip vrf forwarding a
 ip address 20.22.22.22 255.255.255.255

router bgp 1
 address-family ipv4 vrf a
   network 20.22.22.22 mask 255.255.255.255
   redistribute ospf 10 match internal external 1 external 2 nssa-external 1 nssa-external 2

router ospf 10 vrf a
 area 1 sham-link 20.22.22.22 20.33.33.33
 redistribute bgp 1 subnets  


PE3#

interface Loopback20
 ip vrf forwarding a
 ip address 20.33.33.33 255.255.255.255
end

router bgp 1
 address-family ipv4 vrf a
  network 20.33.33.33 mask 255.255.255.255
  redistribute ospf 10 match internal external 1 external 2 nssa-external 1 nssa-external 2
!       
router ospf 10 vrf a
 area 1 sham-link 20.33.33.33 20.22.22.22
 redistribute bgp 1 subnets


As soon as the configuration is applied, the sham-link should come up

*Jun 26 01:46:56.400: %OSPF-5-ADJCHG: Process 10, Nbr 20.2.2.2 on OSPF_SL0 from LOADING to FULL, Loading Done

Notice that in the interface section in the log message, OSPF_SL0 is there instead of normal physical Ethernet or POS interface.

Let’s verify that sham-link status
PE2#show ip ospf neighbor
Neighbor ID     Pri   State           Dead Time   Address         Interface
40.0.22.1         0   FULL/  -           -        20.33.33.33     OSPF_SL0
40.2.2.2          1   FULL/BDR        00:00:39    40.0.12.2       Ethernet0/2
PE2#show ip ospf sham-links
Sham Link OSPF_SL0 to address 20.33.33.33 is up
Area 1 source address 20.22.22.22
  Run as demand circuit
  DoNotAge LSA allowed. Cost of using 1 State POINT_TO_POINT,
  Timer intervals configured, Hello 10, Dead 40, Wait 40,
    Hello due in 00:00:04
    Adjacency State FULL (Hello suppressed)
    Index 1/1, retransmission queue length 0, number of retransmission 0
    First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
    Last retransmission scan length is 0, maximum is 0
    Last retransmission scan time is 0 msec, maximum is 0 msec

Notice also that the hellos are suppressed, this is due to the fact that there’s no actual hellos being sent between PE2 and PE3, BGP is already handling the peering signalling between the two PEs.

Now let’s check CE-A2 routing table again

CE-A2#show ip route

      20.0.0.0/32 is subnetted, 3 subnets
O IA     20.2.2.2 [110/21] via 40.0.12.1, 00:37:33, Ethernet0/0
O E2     20.22.22.22 [110/1] via 40.0.12.1, 00:11:06, Ethernet0/0
O E2     20.33.33.33 [110/1] via 40.0.12.1, 00:10:49, Ethernet0/0
      40.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
C        40.0.12.0/24 is directly connected, Ethernet0/0
L        40.0.12.2/32 is directly connected, Ethernet0/0
O        40.0.22.0/24 [110/31] via 40.0.12.1, 00:10:43, Ethernet0/0
C        40.2.2.2/32 is directly connected, Loopback0
O        40.3.3.3/32 [110/32] via 40.0.12.1, 00:10:43, Ethernet0/0
      50.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        50.0.0.0/24 is directly connected, Ethernet0/1
L        50.0.0.1/32 is directly connected, Ethernet0/1

The loopback of CE-B2 now is preferred through the MPLS core since it’s seen as an Intra-Area and the tie breaker between the PE-CE link and the backdoor link is the metric, obviously the metric through the PE-CE link is much lower.

Let’s just confirm that we always need to make the backdoor link metric higher than the PE-CE link.

CE-A2#interface Ethernet0/1
CE-A2(config-if)#no ip ospf cost 1000



CE-A2#show ip route

      20.0.0.0/32 is subnetted, 3 subnets
O IA     20.2.2.2 [110/21] via 40.0.12.1, 00:42:18, Ethernet0/0
O E2     20.22.22.22 [110/1] via 40.0.12.1, 00:15:51, Ethernet0/0
O E2     20.33.33.33 [110/1] via 40.0.12.1, 00:15:34, Ethernet0/0
      40.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
C        40.0.12.0/24 is directly connected, Ethernet0/0
L        40.0.12.2/32 is directly connected, Ethernet0/0
O        40.0.22.0/24 [110/30] via 50.0.0.2, 00:00:09, Ethernet0/1
C        40.2.2.2/32 is directly connected, Loopback0
O        40.3.3.3/32 [110/11] via 50.0.0.2, 00:00:09, Ethernet0/1
      50.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        50.0.0.0/24 is directly connected, Ethernet0/1
     L        50.0.0.1/32 is directly connected, Ethernet0/1

It seems that CE-A2 now is preferring the Intra-Area route through the backdoor link again, but this time because the metric is lower. It is always a best practice to increase the backdoor link OSPF cost to a very high number to make sure that whatever the metric received by the PE routers, it will always be much lower than the backdoor link.

OSPF sham-link can also be adjusted from PE routers and will be calculated in SPF algorithm as if it was a point-to-point link with the configured value

PE2(config)#router ospf 10 vrf a
PE2(config-router)#area 1 sham-link 20.22.22.22 20.33.33.33 cost 1


Hopefully that wraps up the sham-link, if you have comments, please share them in the comments section below