Saturday, May 9, 2026

EVPN, Simplest Example

 I've been meaning to write this one up for a while. If you've been working in networking for the last few years, you've probably heard about EVPN-VXLAN. Maybe it's been pitched to you as the next-gen data center fabric, the spine-leaf revolution, or the thing that kills spanning tree for good. All true, by the way.

But reading about it and actually building it are two different things. I spent last weekend wiring up a lab with Arista vEOS containers, and I figured I'd walk through exactly how it works — not the theory, but the actual config. Here's what I built and how it all fits together.

The Topology

I won't bore you with a Visio diagram. Here's what it looks like:

         Spine-1              Spine-2
       11.1.1.1/32          22.2.2.2/32
       100.0.0.1/32         100.0.0.2/32
           |                      |
      10.1.x.x/30-------------10.2.x.x/30
           |                      |
      Leaf-2 ==(MLAG)== Leaf-4    Leaf-3
     100.0.0.22/32             100.0.0.33/33
           |                      |
          EP-1                   EP-2

Four Arista switches. Two spines, two leafs in an MLAG pair, one standalone leaf, and two endpoints that simulate actual servers. No hardware — just vEOS-lab boxes running EOS 4.26.9M.

Step 1: The Underlay (Just Good Old OSPF)

Before you can do any fancy overlay stuff, the switches need to talk to each other. I went with OSPF because it's dead simple and I don't need BGP at the underlay level for a lab.

Every switch gets a Loopback0 for the router-id and a Loopback1 for VTEP traffic. The point-to-point links between spines and leaves go into OSPF area 0. Here's what it looks like on a spine:

router ospf 1
   router-id 11.1.1.1
   network 10.1.2.0/30 area 0.0.0.0
   network 10.1.3.0/30 area 0.0.0.0
   network 10.1.4.0/30 area 0.0.0.0
   network 11.1.1.1/32 area 0.0.0.0

And a leaf:

router ospf 1
   router-id 2.2.2.2
   network 2.2.2.2/32 area 0.0.0.0
   network 10.1.2.0/30 area 0.0.0.0
   network 10.2.2.0/30 area 0.0.0.0
   network 100.0.0.22/32 area 0.0.0.0

Notice I'm advertising the VTEP loopback (100.0.0.x) into OSPF too. That's critical — the VTEPs need to reach each other's source IPs, otherwise VXLAN encapsulation won't work.

The spine interfaces are pure L3 (no switchport, IP configured). The leaf interfaces facing the spines are also L3. Only the ports facing endpoints are switchports.

Step 2: iBGP for EVPN — Spines as Route Reflectors

This is where it gets interesting. The overlay (EVPN) runs over iBGP with the spines acting as route reflectors. The leaves peer with both spines, and the spines reflect routes between leaves. No full mesh required.

On both spines, the config is almost identical:

router bgp 65000
   router-id 11.1.1.1
   no bgp default ipv4-unicast
   neighbor 2.2.2.2 remote-as 65000
   neighbor 2.2.2.2 update-source Loopback0
   neighbor 2.2.2.2 route-reflector-client
   neighbor 2.2.2.2 send-community
   !
   address-family evpn
      neighbor 2.2.2.2 activate
      neighbor 3.3.3.3 activate

The route-reflector-client line is the key — it tells the spine to redistribute EVPN routes it learns from one leaf to the other leaves. Without it, each leaf would need a BGP session to every other leaf, which doesn't scale.

On the leaf side, it's simpler:

router bgp 65000
   router-id 2.2.2.2
   neighbor 11.1.1.1 remote-as 65000
   neighbor 11.1.1.1 update-source Loopback0
   neighbor 11.1.1.1 send-community
   neighbor 22.2.2.2 remote-as 65000
   !
   address-family evpn
      neighbor 11.1.1.1 activate
      neighbor 22.2.2.2 activate

Why two spines? Redundancy. If Spine-1 dies, EVPN routes still flow through Spine-2. The leaves install routes from both and get ECMP (equal-cost multipath) automatically. You can see this in the route table — routes show up with * >Ec flags, meaning they're reachable via both spines.

Step 3: VLANs to VNIs — The Heart of VXLAN

VXLAN is basically VLAN-on-steroids. Instead of a 12-bit VLAN ID (max 4094), you get a 24-bit VNI (VXLAN Network Identifier) — 16 million segments. But the real magic is that VXLAN encapsulates L2 frames inside UDP packets, so they can travel across L3 networks.

I mapped VLANs to VNIs like this:

10 10010 Management-ish segment
50 10050 Spare / not fully used yet
55 10055 Tenant A workload
66 10066 Tenant B workload (only on Leaf-2/4)

On the leaf, you configure the VXLAN interface like this:

interface Vxlan1
   vxlan source-interface Loopback1
   vxlan udp-port 4789
   vxlan vlan 10,50,55,66 vni 10010,10050,10055,10066
   vxlan vrf A vni 55555
   vxlan vrf B vni 66666

source-interface Loopback1 is your VTEP IP. In my case, Leaf-2 and Leaf-4 use 100.0.0.22, Leaf-3 uses 100.0.0.33. The spines don't run VXLAN at all — they're pure L3 switches that just forward the encapsulated UDP packets between VTEPs.

Step 4: L2 EVPN — Telling Other Switches About MACs

EVPN is the control plane for VXLAN. It's what tells VTEPs "hey, MAC address aa:bb:cc:dd is reachable behind VTEP 100.0.0.33, send your VXLAN traffic there."

There are a few route types in EVPN. The ones that matter most are:

Type 2 (MAC/IP Advertisement): This is what teaches the fabric about host MACs and optionally their IPs. Each leaf runs the redistribute learned command under the VLAN BGP config, which tells BGP to push locally-learned MACs into EVPN:

   vlan 55
      rd 2.2.2.2:10055
      route-target both 10055:10055
      redistribute learned

When EP-2 (connected to Leaf-3) sends a frame, Leaf-3 learns its MAC and advertises it via BGP EVPN to both spines, which reflect it to Leaf-2/4. Now Leaf-2 knows: "MAC 5001.009b.566c is behind 100.0.0.33."

Type 3 (IMET — Inclusive Multicast Ethernet Tag): This handles BUM traffic (broadcast, unknown unicast, multicast). Each leaf tells the others "I'm participating in VNI 10055, send BUM traffic to my VTEP IP." Looks like this in the EVPN table:

RD: 2.2.2.2:10055 imet 100.0.0.22
RD: 3.3.3.3:10055 imet 100.0.0.33

ARP requests (broadcasts) get flooded to all VTEPs participating in that VNI.

Step 5: L3 EVPN — Routing Between Subnets

Here's where it gets really cool. With traditional VXLAN, inter-subnet traffic has to hairpin through a gateway. With EVPN, you can run an anycast gateway — every leaf acts as the default gateway for the same subnet, with the same IP AND the same MAC address.

interface Vlan55
   vrf A
   ip address virtual 10.55.55.1/24

Both Leaf-2 and Leaf-3 have this same config. The virtual keyword means the gateway IP 10.55.55.1 is active on both switches simultaneously. Hosts that ARP for their gateway get the same MAC response from any leaf. As far as the hosts are concerned, their gateway is always local.

For routing between different tenants/subnets, you use a Layer 3 VNI. I have VRF A (L3VNI 55555) and VRF B (L3VNI 66666). Traffic between VRF A and VRF B on different leaves gets VXLAN-encapsulated with the L3VNI. On the same leaf, it routes locally.

The BGP config for VRFs looks like:

   vrf A
      rd 2.2.2.2:1
      route-target import evpn 1:1
      route-target export evpn 1:1
      redistribute connected

The Route Target (RT) controls who imports what. RT 1:1 is for VRF A, RT 2:2 is for VRF B. You can cross-import them if you want — Leaf-2's VRF B actually imports both RT 1:1 and RT 2:2, which means VRF B can reach VRF A routes. That's how EP-1 on VLAN 66 (VRF B, 10.66.66.0/24) can reach EP-2 on VLAN 55 (VRF A, 10.55.55.0/24).

Type 5 (IP Prefix): These are the L3 EVPN routes. Instead of advertising a MAC, they advertise an IP prefix:

RD: 2.2.2.2:1 ip-prefix 10.55.55.0/24
RD: 3.3.3.3:1 ip-prefix 10.55.55.0/24

This tells the fabric "I have the subnet 10.55.55.0/24 locally." Both leaves advertise it because both have the anycast gateway configured.

Step 6: MLAG — The Ugly Truth About Redundancy

I've got Leaf-3 and Leaf-4 in an MLAG pair. MLAG (Multi-Chassis Link Aggregation) lets you connect a server or switch to two switches using a standard LAG/LACP bundle, making both switches look like a single device.

mlag configuration
   domain-id mlag1
   local-interface Vlan4094
   peer-address 10.40.94.4
   peer-link Port-Channel100

The peer-link (Port-Channel100, built from two 10G links) carries all VLANs plus the MLAG keepalive VLAN 4094. The peer-address is how the two switches talk MLAG protocol.

Why MLAG in an EVPN fabric? Because some endpoints don't speak BGP or EVPN and just want a simple LACP bundle to two switches. EP-2 connects to Leaf-3 via Port-Channel5 (two member links) with MLAG. If one leaf goes down, the server keeps forwarding through the other leaf.

One thing that tripped me up: the VTEP source IP on MLAG pairs should use the same IP (100.0.0.22) so that remote VTEPs see both switches as the same VXLAN endpoint. The vxlan virtual-router encapsulation mac-address mlag-system-id line takes care of the MAC part — both switches use the same MLAG system MAC for VXLAN encapsulation, so remote switches don't flip-flop between MAC addresses.

The Traffic Flow

Let me walk through what happens when EP-1 (10.55.55.10, connected to Leaf-2) sends a packet to EP-2 (10.55.55.2, connected to Leaf-3):

  1. Same subnet: EP-1 does a MAC lookup. If it already knows EP-2's MAC (learned from an earlier ARP), it sends the frame directly.
  2. Leaf-2 receives the frame on access port Ethernet3, VLAN 55. It looks up the destination MAC in its MAC table.
  3. The MAC is remote — Leaf-2 learned it via EVPN Type 2 route: MAC 5001.009b.566c is behind VTEP 100.0.0.33.
  4. VXLAN encapsulation: Leaf-2 takes the original L2 frame and wraps it in a UDP packet with VNI 10055, outer source IP 100.0.0.22, outer destination IP 100.0.0.33.
  5. Routing through spines: The spine doesn't care about VXLAN. It looks at the outer IP header (100.0.0.22 → 100.0.0.33) and routes it via OSPF.
  6. Leaf-3 receives and decapsulates: Takes the outer header off, gets the original L2 frame, and delivers it to EP-2 on Port-Channel5, VLAN 55.

For inter-VRF traffic (say EP-2 10.55.55.2 → EP-1's other IP 10.66.66.10):

  1. EP-2 sends to its default gateway 10.55.55.1.
  2. Leaf-3 receives the packet, routes it in VRF A. Destination is 10.66.66.10, which is in VRF B.
  3. Leaf-3 doesn't have VRF B locally. It VXLAN-encapsulates using L3VNI 66666 and sends to Leaf-2.
  4. Leaf-2 decapsulates, routes in VRF B, and delivers to EP-1 on VLAN 66.

No comments:

Post a Comment