Maximum Transmission Unit is one of
the topics that always confused a lot of people. In this series I’ll try to
explain like many before me did, but hopefully I’ll clear some confusion
Let’s first examine a how a
network device sends a Frame
Let’s first identify the
components needed for an Ethernet Interface to be sent successfully
1.
Preamble, Start of Frame and Inter-frame Gap aren’t part of the frame,
rather than they help Ethernet frames being sent and received without errors or
collisions
2.
Ethernet Frames encapsulate the protocol that carries the Packet inside
it, which can be IPv4 (0x0800), IPv6 (0x86DD), ARP(0x0806) etc. it consists of Destination MAC, Source MAC, Frame-type (
Ether-Type) and CRC for error checking. It’s also worth noting that CRC checks
are part of the Media itself. So it can be excluded from the MTU calculations
as well ( note: wireshark will not display CRC ( FCS) error checks, so when
using it, exclude it from your calculation)
3.
IP packet (IPv4 or IPv6 ) includes source and destination IPs and refers
to the data protocol inside it, E.g. TCP, UDP, ICMP etc.
4.
Protocol as mentioned in the above point also has a header, TCP = 20
bytes , UDP 8 Bytes, ICMP = 8 bytes
5.
Payload is the actual data itself
We’ll
just discard the Preamble, Start of Frame and Inter-frame gap since they aren’t
part of the frame itself rather than a mechanism to differentiate between
frames. It was worth mentioning them anyway.
After discarding the non-frame
components, here’s what it looks like for an ICMP packet as an example
Frame Sizes
·
The minimum Ethernet frame can be 64 bytes
Assuming this is an ICMP packet, let’s see what
the payload can be with the minimum Ethernet Frame size
Destination MAC
|
6
|
Source MAC
|
6
|
Frame Type
|
2
|
IP Header
|
20
|
ICMP Header
|
8
|
payload
|
22
|
Total Frame
|
64
|
Frames with sizes less than 64 bytes are padded
to reach 64 bytes. Meaning, if the Layer 3 data is less than 46 byte, extra
bytes will be added to Layer 2
·
The Maximum Ethernet frame can be 1518
Taking an ICMP Packet as an example again
Destination MAC
|
6
|
Source MAC
|
6
|
Frame Type
|
2
|
VLAN Tag
|
4
|
IP Header
|
20
|
ICMP Header
|
8
|
payload
|
1472
|
Total Frame
|
1518
|
Now why is
the default MTU 1500 bytes?
Well, the reason for the 1500 MTU
limit was for several reasons,
·
Early Network interface cards didn’t have a large buffer to contain a
large amount of frames ( bytes), so the 1500 bytes seemed “ok” at that period
of time
·
Since Ethernet is a shared medium, the longer the frame is, the longer
other devices on the same medium had to wait before being able to transmit data
·
The bigger the frame is, the more vulnerable it is being transmitted
with error, and hence it would be resent again which will again introduce delay
in the network.
Now let’s check this simple
topology below,
By default, all interfaces has a
default MTU 1500 and IP MTU.
R1#show int f0/0
FastEthernet0/0 is
up, line protocol is up
Hardware is Gt96k FE, address is
c200.7278.0000 (bia c200.7278.0000)
Internet address is 10.1.2.1/24
MTU 1500 bytes, BW 10000 Kbit/sec,
DLY 1000 usec,
R1#show ip int f0/0
FastEthernet0/0 is
up, line protocol is up
Internet address is 10.1.2.1/24
Broadcast address is 255.255.255.255
Address determined by setup command
MTU is 1500 bytes
Let’s ping from R1 to R3 with a
packet size of 36 and see how it goes
R1#ping 10.2.3.3
size 36
Type escape sequence
to abort.
Sending 5, 36-byte
ICMP Echos to 10.2.3.3, timeout is 2 seconds:
!!!!!
Success rate is 100
percent (5/5), round-trip min/avg/max = 20/39/60 ms
Now let’s see the captured
traffic
To make things clear here,
Wireshark doesn’t capture the FCS which is 4 bytes, that’s why it sometimes
gets confusing when using Wireshark to calculate MTU
Now let’s do some additions
After pinging with the minimum
allowed size of 36 bytes
Destination
MAC
|
6
|
|
Source MAC
|
6
|
|
Frame Type
|
2
|
|
IP Header
|
20
|
total
length = 36
|
ICMP Header
|
8
|
|
payload
|
8
|
|
Extra Padding
|
10
|
|
Total Frame
|
60
|
We can also conclude that the
size referenced in the IOS command, means the size of the IP header and
anything underneath it, which excludes the frame headers
Now let’s try pinging with a
packet size 1500 bytes with the DO-NOT Fragment bit set
R1#ping 10.2.3.3 size
1500 df-bit
Type escape sequence
to abort.
Sending 5, 1500-byte
ICMP Echos to 10.2.3.3, timeout is 2 seconds:
Packet sent with the
DF bit set
!!!!!
Success rate is 100 percent
(5/5), round-trip min/avg/max = 36/41/48 ms
So we can now conclude one thing,
that the MTU set on the interface on Cisco IOS devices is actually 1514 not
1500 (it’s even 1518 but we decided to neglect the FCS for now), the thing is
Cisco Excludes the frame header from the MTU command.
Let’s try pinging with a packet
size which is larger than 1500 (Larger than both MTU and IP MTU) with a DF-Bit
set
R1#ping 10.2.3.3 size
1501 df-bit
Type escape sequence
to abort.
Sending 5, 1501-byte
ICMP Echos to 10.2.3.3, timeout is 2 seconds:
Packet sent with the
DF bit set
.....
Success rate is 0
percent (0/5)
It’s failing, but can we increase
the IP MTU on F0/0 interface on R1?
R1(config)#int f0/0
R1(config-if)#ip mtu
?
<68-1500> MTU (bytes)
It seems that we can’t, but what
about MTU on interface F0/0?
R1(config-if)#mtu ?
<64-1600> MTU size in bytes
R1(config-if)#mtu
1600
Now let’s try pinging R3 again
R1#ping 10.2.3.3 size
1501 df-bit
Type escape sequence
to abort.
Sending 5, 1501-byte
ICMP Echos to 10.2.3.3, timeout is 2 seconds:
Packet sent with the
DF bit set
M.M.M
You can see that the interface is
capable of sending frames larger than 1500 bytes, but it showed an error with
MTU regarding the size of the IP packets size. In plain English, the physical
hardware of the interface is capable of sending frames up to 1600 bytes, but it
can’t send IP packets larger than 1500 bytes because the configured IP MTU is
1500 only. Of course this is platform specific, other types of hardware are capable
of sending MTUs up to 9192 bytes
But now since the MTU
is 1600, the IP MTU can be increased to 1600 as well which leads us to the fact
that IP MTU =< MTU (Less than or equal)
R1(config)#int f0/0
R1(config-if)#ip mtu ?
<68-1600> MTU
(bytes)
We can now set our IP MTU to as big as
1600 bytes. Now let’s try to ping with a un-fragmented packet larger than 1500
again
R1#ping 10.1.2.2 size 1600 df-bit
Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.1.2.2,
timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!
Now that’s more like it. Again to clear
any confusion, the reason that MTU and IP MTU can have the same size configured
is due to the fact that Cisco’s IOS doesn’t calculate the 14 bytes of Layer 2
headers in the MTU command, so even if I pinged with the maximum MTU which is
1600, the router still room for the 14 bytes of layer 2 headers.
Another question might arise, then why do
I need MTU and IP MTU commands? Should I increase the MTU and move along? Well,
the reason is you might want to give room for protocols that are above layer 3
but at the same time you want the layer 3 packets to have the same size.
Meaning, I might want to give room for MPLS labels, Q-in-Q and VLAN tags
without increasing the Layer 3 size, only then I’ll increase the MTU but keep
the IP MTU as it is.
All the previous tests was done from R1
which I was configuring the MTU from, but how would a transient router react to
bigger MTU or IP MTU or both? And I have to confess that all that time, R2 and
R3 had the maximum MTU and IP MTU configured which is 1600)
Well, this is very interesting because
when a router is sending a packet out from an interface, it will fragment the
packet even if it was 10 times bigger than its MTU, but if it’s receiving a
packet larger than its MTU, it will just discard it.
We can test that by simply setting R1
F0/0 interface with MTU of 1600 and IP MTU of 1600 then setting R2 F0/0
interface to MTU of 1500 and IP MTU of 1500
Now let’s ping with 1600 from R1 but this
time I will NOT set the DF-Bit which means I’m giving R2 the liberty to do
whatever it wants with the packet
R1#ping 10.1.2.2 size 1600
Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.1.2.2,
timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
R2 isn’t even replying with a fragmented
packets to R1, which proves that receiving a bigger MTU isn’t feasible
Now let’s set R2 MTU to 1600 and IP MTU
to 1500
R2(config)#int f0/0
R2(config-if)#mtu 1600
R2(config-if)#ip mtu 1500
We’ll ping with 1600 bytes from R1
without DF-Bit again
R1#ping 10.1.2.2 size 1600
Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.1.2.2,
timeout is 2 seconds:
!!!!!
This time, the packet managed to through
R2’s interface, and even though the IP packet size is 1600, R2 managed to
defragment the reply packets
Now we know the difference between the
Egress interface MTU and Ingress interface MTU.
In the near future I’ll append the IOS-XR
and JUNOS as part 2, so make sure you add me to your RSS feed.
thanks
ReplyDeletebut there is a problem in your first and second depict
MSS is an optional field on TCP header which is the payload size on the TCP header NOT included the TCP header itself
Very detailed article. Thanks for sharing this.
ReplyDeleteThanks a lot. I was confused about Ethernet MTU, IP MTU, this really helps.
ReplyDeletegreat explanation
ReplyDeletemajor133 is correct, also MAC Src & Dst fields are only 6 bytes.
ReplyDelete