Saturday, August 10, 2013

BGP load-sharing across different service providers with multipath-relax

In service provider world, a BGP router might be connected to many other providers in different autonomous-systems. BGP by default doesn’t load-share traffic to the same prefix through different neighboring autonomous systems, the BGP route-selection algorithm will ultimately choose one autonomous system to forward traffic to. The target from this post is to make BGP dual-home through different autonomous system.
 Let’s check Cisco’s BGP implementation when it comes to choosing the best route to a destination prefix.
1.       Exclude routes with inaccessible next-hops
2.       Prefer the path with the highest WEIGHT.
3.       Prefer the path with the highest LOCAL_PREF.
4.     Prefer the path that was locally originated via a network or aggregate BGP sub-command or through redistribution from an IGP.
5.       Prefer the path with the shortest AS_PATH.
6.       Prefer the path with the lowest origin type.
7.       Prefer the path with the lowest multi-exit discriminator (MED).
8.       Prefer eBGP over iBGP paths.
9.       Prefer the path with the lowest IGP metric to the BGP next hop.
10.   Determine if multiple paths require installation in the routing table for BGP Multipath.
11.    When both paths are external, prefer the path that was received first (the oldest one).
12.    Prefer the route that comes from the BGP router with the lowest router ID.
13.    If the originator or router ID is the same for multiple paths, prefer the path with the minimum cluster list length.
14.    Prefer the path that comes from the lowest neighbor address.

Now to complete the scene here, I need to mention one more thing. BGP by default also doesn’t load-share traffic on multiple links if it is multi-homed. To enable using different connections, we have to use BGP sub-command maximum-paths, which will enable load-sharing to neighboring autonomous-system.


From the topology shown, the tie-breaker for R1 to install the prefix 200.0.0.1/24 was either step 12 or 13, preferring R2 to forward traffic destined to 200.0.0.1



 Let’s see how it looks on R1


R1#show run | s router bgp
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 200
 neighbor 2.2.2.2 ebgp-multihop 255
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 300
 neighbor 3.3.3.3 ebgp-multihop 255
 neighbor 3.3.3.3 update-source Loopback0
 no auto-summary

R1#show ip route

B     200.0.0.0/24 [20/0] via 2.2.2.2, 00:00:51

R1#show ip bgp 200.0.0.0
BGP routing table entry for 200.0.0.0/24, version 13
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     5
  300 400
    3.3.3.3 (metric 2) from 3.3.3.3 (3.3.3.3)
      Origin IGP, localpref 100, valid, external
  200 400
    2.2.2.2 (metric 2) from 2.2.2.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, best

It seems that BGP chose the path through R2 as the best path. Now let’s change that behavior by issuing two commands that works in conjunction in order to achieve our goal to multi-home R1 to AS200 and AS400.

The first command is a hidden command, bgp bestpath as-path multipath-relax. The command simply tells the router to ignore the fact that the prefix is received from different autonomous systems, and to update the router table with the maximum number of hops configured by the second command maximum-paths
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 bgp bestpath as-path multipath-relax
 neighbor 2.2.2.2 remote-as 200
 neighbor 2.2.2.2 ebgp-multihop 255
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 300
 neighbor 3.3.3.3 ebgp-multihop 255
 neighbor 3.3.3.3 update-source Loopback0
 maximum-paths 2
 no auto-summary

Just remember to clear or soft clear the sessions so that it takes effect

R1#show ip bgp 200.0.0.0
BGP routing table entry for 200.0.0.0/24, version 2
Paths: (2 available, best #1, table default)
Multipath: eBGP
  Advertised to update-groups:
     6
  200 400
    2.2.2.2 (metric 2) from 2.2.2.2 (2.2.2.2)
      Origin IGP, localpref 100, valid, external, multipath, best
  300 400
    3.3.3.3 (metric 2) from 3.3.3.3 (3.3.3.3)
      Origin IGP, localpref 100, valid, external, multipath

Even though BGP still sees R2 as the best exit, the routing table shows that prefix 200.0.0.0/24 has two next-hops

R1#show ip route 200.0.0.0
Routing entry for 200.0.0.0/24
  Known via "bgp 100", distance 20, metric 0
  Tag 200, type external
  Last update from 3.3.3.3 00:05:08 ago
  Routing Descriptor Blocks:
    3.3.3.3, from 3.3.3.3, 00:05:08 ago
      Route metric is 0, traffic share count is 1
      AS Hops 2
      Route tag 200
      MPLS label: none
  * 2.2.2.2, from 2.2.2.2, 00:05:08 ago
      Route metric is 0, traffic share count is 1
      AS Hops 2
      Route tag 200
      MPLS label: none

Target achieved!

Now let’s talk a little about service providers here. Just because you can load-share traffic across different service providers. Different providers have different equipment, latencies and different upstream providers. You may find the least latency and jitter on a certain provider apart from others, which will be a great choice to forward real-time traffic. The design is totally case specific, just make sure you get the right pieces together before going for it.