Friday, July 25, 2014

Understanding the difference between MTU and IP MTU

Maximum Transmission Unit is one of the topics that always confused a lot of people. In this series I’ll try to explain like many before me did, but hopefully I’ll clear some confusion


Let’s first examine a how a network device sends a Frame



Let’s first identify the components needed for an Ethernet Interface to be sent successfully
1.      Preamble, Start of Frame and Inter-frame Gap aren’t part of the frame, rather than they help Ethernet frames being sent and received without errors or collisions
2.      Ethernet Frames encapsulate the protocol that carries the Packet inside it, which can be IPv4 (0x0800), IPv6 (0x86DD), ARP(0x0806) etc. it consists of Destination MAC, Source MAC, Frame-type ( Ether-Type) and CRC for error checking. It’s also worth noting that CRC checks are part of the Media itself. So it can be excluded from the MTU calculations as well ( note: wireshark will not display CRC ( FCS) error checks, so when using it, exclude it from your calculation)
3.      IP packet (IPv4 or IPv6 ) includes source and destination IPs and refers to the data protocol inside it, E.g. TCP, UDP, ICMP etc.
4.      Protocol as mentioned in the above point also has a header, TCP = 20 bytes , UDP 8 Bytes, ICMP = 8 bytes
5.      Payload is the actual data itself

We’ll just discard the Preamble, Start of Frame and Inter-frame gap since they aren’t part of the frame itself rather than a mechanism to differentiate between frames. It was worth mentioning them anyway.


After discarding the non-frame components, here’s what it looks like for an ICMP packet as an example



Frame Sizes
·         The minimum Ethernet frame can be 64 bytes
Assuming this is an ICMP packet, let’s see what the payload can be with the minimum Ethernet Frame size
Destination MAC
6
Source MAC
6
Frame Type
2
IP Header
20
ICMP Header
8
payload
22
Total Frame
64

Frames with sizes less than 64 bytes are padded to reach 64 bytes. Meaning, if the Layer 3 data is less than 46 byte, extra bytes will be added to Layer 2

·         The Maximum Ethernet frame can be 1518
Taking an ICMP Packet as an example again
Destination MAC
6
Source MAC
6
Frame Type
2
VLAN Tag
4
IP Header
20
ICMP Header
8
payload
1472
Total Frame
1518


Now why is the default MTU 1500 bytes?
Well, the reason for the 1500 MTU limit was for several reasons,
·         Early Network interface cards didn’t have a large buffer to contain a large amount of frames ( bytes), so the 1500 bytes seemed “ok” at that period of time
·         Since Ethernet is a shared medium, the longer the frame is, the longer other devices on the same medium had to wait before being able to transmit data
·         The bigger the frame is, the more vulnerable it is being transmitted with error, and hence it would be resent again which will again introduce delay in the network.

Now let’s check this simple topology below,


By default, all interfaces has a default MTU 1500 and IP MTU.

R1#show int f0/0
FastEthernet0/0 is up, line protocol is up
  Hardware is Gt96k FE, address is c200.7278.0000 (bia c200.7278.0000)
  Internet address is 10.1.2.1/24
  MTU 1500 bytes, BW 10000 Kbit/sec, DLY 1000 usec,


R1#show ip int f0/0
FastEthernet0/0 is up, line protocol is up
  Internet address is 10.1.2.1/24
  Broadcast address is 255.255.255.255
  Address determined by setup command
  MTU is 1500 bytes

Let’s ping from R1 to R3 with a packet size of 36 and see how it goes

R1#ping 10.2.3.3 size  36

Type escape sequence to abort.
Sending 5, 36-byte ICMP Echos to 10.2.3.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/39/60 ms


Now let’s see the captured traffic


To make things clear here, Wireshark doesn’t capture the FCS which is 4 bytes, that’s why it sometimes gets confusing when using Wireshark to calculate MTU

Now let’s do some additions

After pinging with the minimum allowed size of 36 bytes

Destination MAC
6
Source MAC
6
Frame Type
2
IP Header
20
total length = 36
ICMP Header
8
payload
8
Extra Padding
10
Total Frame
60


We can also conclude that the size referenced in the IOS command, means the size of the IP header and anything underneath it, which excludes the frame headers


Now let’s try pinging with a packet size 1500 bytes with the DO-NOT Fragment bit set


R1#ping 10.2.3.3 size 1500 df-bit

Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 10.2.3.3, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/41/48 ms


So we can now conclude one thing, that the MTU set on the interface on Cisco IOS devices is actually 1514 not 1500 (it’s even 1518 but we decided to neglect the FCS for now), the thing is Cisco Excludes the frame header from the MTU command.

Let’s try pinging with a packet size which is larger than 1500 (Larger than both MTU and IP MTU) with a DF-Bit set

R1#ping 10.2.3.3 size 1501 df-bit

Type escape sequence to abort.
Sending 5, 1501-byte ICMP Echos to 10.2.3.3, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)

It’s failing, but can we increase the IP MTU on F0/0 interface on R1?

R1(config)#int f0/0
R1(config-if)#ip mtu ?
  <68-1500>  MTU (bytes)

It seems that we can’t, but what about MTU on interface F0/0?

R1(config-if)#mtu ?
  <64-1600>  MTU size in bytes

R1(config-if)#mtu 1600

Now let’s try pinging R3 again

R1#ping 10.2.3.3 size 1501 df-bit

Type escape sequence to abort.
Sending 5, 1501-byte ICMP Echos to 10.2.3.3, timeout is 2 seconds:
Packet sent with the DF bit set
M.M.M

You can see that the interface is capable of sending frames larger than 1500 bytes, but it showed an error with MTU regarding the size of the IP packets size. In plain English, the physical hardware of the interface is capable of sending frames up to 1600 bytes, but it can’t send IP packets larger than 1500 bytes because the configured IP MTU is 1500 only. Of course this is platform specific, other types of hardware are capable of sending MTUs up to 9192 bytes

But now since the MTU is 1600, the IP MTU can be increased to 1600 as well which leads us to the fact that IP MTU =< MTU (Less than or equal)
R1(config)#int f0/0
R1(config-if)#ip mtu ?
  <68-1600>  MTU (bytes)

We can now set our IP MTU to as big as 1600 bytes. Now let’s try to ping with a un-fragmented packet larger than 1500 again

R1#ping 10.1.2.2 size 1600 df-bit

Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.1.2.2, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!

Now that’s more like it. Again to clear any confusion, the reason that MTU and IP MTU can have the same size configured is due to the fact that Cisco’s IOS doesn’t calculate the 14 bytes of Layer 2 headers in the MTU command, so even if I pinged with the maximum MTU which is 1600, the router still room for the 14 bytes of layer 2 headers.

Another question might arise, then why do I need MTU and IP MTU commands? Should I increase the MTU and move along? Well, the reason is you might want to give room for protocols that are above layer 3 but at the same time you want the layer 3 packets to have the same size. Meaning, I might want to give room for MPLS labels, Q-in-Q and VLAN tags without increasing the Layer 3 size, only then I’ll increase the MTU but keep the IP MTU as it is.


All the previous tests was done from R1 which I was configuring the MTU from, but how would a transient router react to bigger MTU or IP MTU or both? And I have to confess that all that time, R2 and R3 had the maximum MTU and IP MTU configured which is 1600)

Well, this is very interesting because when a router is sending a packet out from an interface, it will fragment the packet even if it was 10 times bigger than its MTU, but if it’s receiving a packet larger than its MTU, it will just discard it.

We can test that by simply setting R1 F0/0 interface with MTU of 1600 and IP MTU of 1600 then setting R2 F0/0 interface to MTU of 1500 and IP MTU of 1500

Now let’s ping with 1600 from R1 but this time I will NOT set the DF-Bit which means I’m giving R2 the liberty to do whatever it wants with the packet

R1#ping 10.1.2.2 size 1600

Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.1.2.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)


R2 isn’t even replying with a fragmented packets to R1, which proves that receiving a bigger MTU isn’t feasible

Now let’s set R2 MTU to 1600 and IP MTU to 1500

R2(config)#int f0/0
R2(config-if)#mtu 1600
R2(config-if)#ip mtu 1500


We’ll ping with 1600 bytes from R1 without DF-Bit again


R1#ping 10.1.2.2 size 1600

Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.1.2.2, timeout is 2 seconds:
!!!!!



This time, the packet managed to through R2’s interface, and even though the IP packet size is 1600, R2 managed to defragment the reply packets

Now we know the difference between the Egress interface MTU and Ingress interface MTU.

In the near future I’ll append the IOS-XR and JUNOS as part 2, so make sure you add me to your RSS feed.

5 comments:

  1. thanks
    but there is a problem in your first and second depict
    MSS is an optional field on TCP header which is the payload size on the TCP header NOT included the TCP header itself

    ReplyDelete
  2. Very detailed article. Thanks for sharing this.

    ReplyDelete
  3. Thanks a lot. I was confused about Ethernet MTU, IP MTU, this really helps.

    ReplyDelete
  4. major133 is correct, also MAC Src & Dst fields are only 6 bytes.

    ReplyDelete