on 01-17-2011 04:24 AM
Links on routers have an MTU. The outgoing packets, including OSPF packets cannot have a bigger size than the interface MTU. Let’s have a look at the behavior of OSPF and packets.
This is what RFC 2328 (OSPF version 2 specification) says about OSPF packets and MTU.
A.1 Encapsulation of OSPF packets
OSPF runs directly over the Internet Protocol's network layer. OSPF
packets are therefore encapsulated solely by IP and local data-link
headers.
OSPF does not define a way to fragment its protocol packets, and
depends on IP fragmentation when transmitting packets larger than
the network MTU. If necessary, the length of OSPF packets can be up
to 65,535 bytes (including the IP header). The OSPF packet types
that are likely to be large (Database Description Packets, Link
State Request, Link State Update, and Link State Acknowledgment
packets) can usually be split into several separate protocol
packets, without loss of functionality. This is recommended; IP
fragmentation should be avoided whenever possible.
Remember that there could be one LSA in one Link State (LS) Update packet, but there can also be many LSAs in one LS Update packet. This is called packing LSAs into one LS Update packet.
Here’s a DBD or Database Description packet, specified in RFC 2328. This packet describes the contents of the
OSPF link-state database.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version # | 2 | Packet length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Router ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Area ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | AuType |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Authentication |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Authentication |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Interface MTU | Options |0|0|0|0|0|I|M|MS
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DD sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+- -+
| |
+- An LSA Header -+
| |
+- -+
| |
+- -+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
Interface MTU is defined as: “The size in bytes of the largest IP datagram that can be sent out
the associated interface, without fragmentation”. So, routers attached to a link exchange their
interface MTU value in DBD packets when the OSPF adjacency is initialized.
Section 10.6 of RFC 2328 says:
If the Interface MTU field in the Database Description packet
indicates an IP datagram size that is larger than the router can
accept on the receiving interface without fragmentation, the
Database Description packet is rejected.
When "debug ip ospf adj" is turned on, we can see the arrival of these DBD packets. In the following example, we can see that there is a mismatch in MTU values between two OSPF neighbors. This router has MTU 1600, while the neighboring OSPF router has interface MTU 2000.
On this router:
OSPF: Rcv DBD from 10.100.1.2 on GigabitEthernet0/1 seq 0x2124 opt 0x52 flag 0x2 len 1452 mtu 2000 state EXSTART
OSPF: Nbr 10.100.1.2 has larger interface MTU
On the neighboring router:
OSPF: Rcv DBD from 10.100.100.1 on GigabitEthernet0/1 seq 0x89E opt 0x52 flag 0x7 len 32 mtu 1600 state EXCHANGE
OSPF: Nbr 10.100.100.1 has smaller interface MTU
The DBD packets are retransmitted continuously and eventually, the OSPF adjacency is torn down.
OSPF: Send DBD to 10.100.1.2 on GigabitEthernet0/1 seq 0x9E6 opt 0x52 flag 0x7 len 32
OSPF: Retransmitting DBD to 10.100.1.2 on GigabitEthernet0/1 [10]
OSPF: Send DBD to 10.100.1.2 on GigabitEthernet0/1 seq 0x9E6 opt 0x52 flag 0x7 len 32
OSPF: Retransmitting DBD to 10.100.1.2 on GigabitEthernet0/1 [11]
%OSPF-5-ADJCHG: Process 1, Nbr 10.100.1.2 on GigabitEthernet0/1 from EXSTART to DOWN, Neighbor Down: Too many retransmissions
Before CSCse01519, OSPF in IOS would build OSPF packets up to a maximum of 1500 bytes. This is a regardless of the interface MTU. So, if the interface MTU is bigger than 1500 bytes, OSPF would still pack only up to 1500 bytes into an OSPF packet. This is somewhat inefficient because OSPF could send bigger packets on the link and achieve a greater throughput. There is one exception to this: if the LSA is so big that one LSA holds more than 1500 bytes, then OSPF builds that packet, no matter what the size (OSPF cannot fragment one LSA). The IP stack of the router then fragments it to fit the MTU of the outgoing interface. This typically occurs when an OSPF router has many links and hence the router LSA because bigger than the link MTU.
Equally so, if the MTU of the outgoing interface is smaller than 1500 bytes, then the OSPF process would still build or pack OSPF packets up 1500 bytes and the IP stack of the router would fragment this into smaller IP packets in order to fit the MTU of the outgoing link. One example where this typically occurs, is an IPSec tunnel between 2 routers running OSPF. The added overhead of the encapsulation bytes of the tunnel leads to an MTU which is lower than 1500 bytes. OSPF builds OSPF packets up to 1500 bytes and they then get fragmented before the router transmits them. This is another inefficiency.
After CSCse01519, OSPF in IOS can pack OSPF packets to be greater than 1500 bytes. This occurs if the MTU of the outgoing interface is greater than 1500 bytes. This will make the transmissions more efficient as more information can be packed into one larger packet. For example, if one OSPF router needs to transmit a lot of external LSAs to an OSPF neighbor, it can pack more external LSAs into one LS Update packet, if that router runs IOS with CSCse01519 implemented.
CSCse01519 also allows OSPF to build packets lower than 1500 bytes. In some scenarios, the MTU between 2 OSPF neighbors is lower than 1500 bytes. See the example above with an IPSec tunnel. In that case, OSPF transmits OSPF packets which are smaller than 1500 bytes, avoiding IP fragmentation, except in the case of one large LSA, bigger than the interface MTU
Here's a specific example of what can go wrong when upgrading an OSPF router and discovering an OSPF MTU issue due to CSCse01519.
Many networks have OSPF neighbors which are connected through a Layer 2 switched network, or transport network, comprised of L2VPN service or a SDH/SONET network. These transport networks can have different MTU settings than the routers running OSPF.
While the MTU setting should be correct on all routers, reflecting the true MTU, there are often mistakes and they can go unnoticed.
Here's an example network, with two routers R1 and R2 running OSPF and they are connected through a Layer 2 switch.
The issue occurs a lot if the routers have MTU-settable Ethernet interfaces. In this case, they are. The interfaces are GigabitEthernet interfaces and have an MTU set to 2000. The MTU of the Layer 2 switch is only 1500 bytes.
Assume that the size of the data traffic is never bigger than 1500 bytes, then there is no problem running IOS without CSCse01519. The OSPF packets will never be larger than 1500 bytes. Except if there is one LSA which is larger than 1500 bytes, in which case the OSPF process on router R1 or R2 builds a Link State Update packet larger than 1500 bytes and transmits it. Assume this packet is 1800 bytes, then it will get dropped by the Layer 2 switch between the routers.
Assume we have an OSPF database on R2 that has enough networks so that the locally originated LSAs are so big that a LS Update packet can be potentially larger than the interface MTU.
If these networks are originated by the covering network command, then the networks show up in the router LSA of R2. R2 builds a router LSA which is bigger than 2000 bytes and transmits it, but IP fragments it down to 2000, the interface MTU. The Layer 2 switch however will drop these packets. OSPF will then retransmit this packet endlessly and the OSPF adjacency is never full. So, the issue is immediately discovered, even when running IOS without CSCse01519.
If these networks are originated by "redistribute connected", then they will show up in external LSAs. OSPF will only try to pack external LSAs into one LS Update packet up to 1500 bytes big.
In this case, because the interface MTU is 2000, the OSPF adjacency reaches the FULL state. The issue of the underlying MTU -which is not adequate- is not immediately discovered.
When we upgrade one router to IOS with CSCse01519, then the issue will be discovered.
Let's see what happens.
First both routers run IOS without CSCse01519.
When the adjacency builds, we see that R1 never receives an OSPF packet bigger than 1500 bytes, even if the MTU of the interfaces is 2000.
We enable "debug ip ospf packets".
OSPF: rcv. v:2 t:1 l:48 rid:10.100.1.2
aid:0.0.0.0 chk:72CF aut:0 auk: from GigabitEthernet0/1
...
OSPF: rcv. v:2 t:4 l:1468 rid:10.100.1.2
aid:0.0.0.0 chk:8389 aut:0 auk: from GigabitEthernet0/1
OSPF: rcv. v:2 t:4 l:136 rid:10.100.1.2
...
L: xx in the debug output shows us the length of the OSPF packet. The biggest OSPF packet sent out was 1468 bytes.
t: 4 means that the type of the OSPF packet is "Link State Update". Refer to this table from RFC 2328, section 4.3, for the
different OSPF packet types.
Type Packet name Protocol function
__________________________________________________________
1 Hello Discover/maintain neighbors
2 Database Description Summarize database contents
3 Link State Request Database download
4 Link State Update Database update
5 Link State Ack Flooding acknowledgment
We see that the OSPF adjacency reaches the full state.
R1#show ip ospf neighbor gigabitEthernet 0/1
Neighbor ID Pri State Dead Time Address Interface
10.100.1.2 0 FULL/ - 00:00:34 10.1.1.2 GigabitEthernet0/1
R2#show ip ospf neighbor gigabitEthernet 0/1
Neighbor ID Pri State Dead Time Address Interface
10.100.100.1 0 FULL/ - 00:00:34 10.1.1.1 GigabitEthernet0/1
We upgrade IOS on R2 to an IOS with CSCse01519.
R2#show ip ospf neighbor gigabitEthernet 0/1
Neighbor ID Pri State Dead Time Address Interface
10.100.100.1 0 LOADING/ - 00:00:33 10.1.1.1 GigabitEthernet0/1
R2#show ip ospf neighbor gigabitEthernet 0/1 detail
Neighbor 10.100.100.1, interface address 10.1.1.1
In the area 0 via interface GigabitEthernet0/1
Neighbor priority is 0, State is LOADING, 5 state changes
DR is 0.0.0.0 BDR is 0.0.0.0
Options is 0x12 in Hello (E-bit L-bit )
Options is 0x52 in DBD (E-bit L-bit O-bit)
LLS Options is 0x1 (LR)
Dead timer due in 00:00:39
Neighbor is up for 00:00:49
Index 1/1, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
Number of retransmissions for last link state request packet 9
Poll due in 00:00:00
R2#show ip ospf neighbor gigabitEthernet 0/1 detail
Neighbor 10.100.100.1, interface address 10.1.1.1
In the area 0 via interface GigabitEthernet0/1
Neighbor priority is 0, State is LOADING, 5 state changes
DR is 0.0.0.0 BDR is 0.0.0.0
Options is 0x12 in Hello (E-bit L-bit )
Options is 0x52 in DBD (E-bit L-bit O-bit)
LLS Options is 0x1 (LR)
Dead timer due in 00:00:33
Neighbor is up for 00:02:06
Index 1/1, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
Number of retransmissions for last link state request packet 25
Poll due in 00:00:03
%OSPF-5-ADJCHG: Process 1, Nbr 10.100.100.1 on GigabitEthernet0/1 from LOADING to DOWN, Neighbor Down: Too many retransmissions
The OSPF adjacency does not reach the FULL state. We see retransmissions. The OSPF adjacency is stuck in LOADING state. OSPF gave up after 25 retransmissions, after which, it will try to establish the adjacency again, but will run into the same issue. So, this continues endlessly.
We see that by only upgrading one router (R2) we uncover a previsouly hidden issue: the underlying MTU is smaller than the one used by the OSPF routers.
When the switch changes MTU to pass 2000 bytes packets, we see an OSPF packet which is bigger than 1500 bytes being transmitted fine.
R1#
OSPF: rcv. v:2 t:3 l:1980 rid:10.100.1.2
aid:0.0.0.0 chk:AC5B aut:0 auk: from GigabitEthernet0/1
To check underlying MTU issues, always ping the OSPF neighbor IP address with a size equal to MTU and the df-bit set.
To discover the value of the underlying MTU, perform this ping and sweep the size. Then count the number of "!" we see in the output and you'll get the real MTU. In this case, the last echo reply we got back from the ping command has size 1500 bytes.
R2#ping
Protocol [ip]:
Target IP address: 10.1.1.1
Repeat count [5]: 1
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: yes
Source address or interface:
Type of service [0]:
Set DF bit in IP header? [no]: yes
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]:
Sweep range of sizes [n]: yes
Sweep min size [36]: 1460
Sweep max size [18024]: 1540
Sweep interval [1]:
Type escape sequence to abort.
Sending 81, [1460..1540]-byte ICMP Echos to 10.1.1.1, timeout is 2 seconds:
Packet sent with the DF bit set
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.............................
...........
Success rate is 49 percent (40/81), round-trip min/avg/max = 1/1/4 ms
A very good article describing the problems.
One subtle thing to notice is the definition of MTU's especially on L3-switches:
MTU size is for 10-100 Mbit interfaces
Jumbo MTU is for 1-10 Gbit interfaces
Routing MTU is for OSPF (guess for other routing protocols as well)
Especially on the L3 switches you may need extended MTU for switching/trunking purposes while it may be nescessary to reduce the routing MTU. This is especially important when mixing switch-models - like C3550, C3560, C3750 as they behave differently.
Any way - this article does give a good understanding of _why_ the problem exists.
Nice and helpful post. Thanks :-)
great explanation, thanks for the article.
Hello Luc,
I have a bit of different understanding on the above topic for the type 1 and type 2 LSAs. I agree to the point that the max size of the type 1 and type 2 lsa can be 65K ( as we have length field of 2 bytes ). I also agree that the device needs to build the complete LSA ( router and Network ) without fragmenting it. But i dont think that the IP layer can then fragment this packet, if the interface MTU is less than the LSA MTU. There is no field in teh type 1 and type 2 LSA that can reassemble the fragmented LSA.
For example, if the size of the Type 1 LSA generated by the device is more than 1500 bytes and the link is of only 1500 bytes, then the IP header cannot just fragement the packet. Even if it does, the LSA would be corrupted when it is received at the receiving end.
I think this can be done only in ISIS ( just speaking about the link state protocols ). LSP for ISIS supports max of 255 fragments which can be reassembled at the receiving end. Since each fragment has its own checksum, they can also be individually verified
Regards,
Shreeram
I do have a question..If the IP layer can frgment the packets then why do we get a OSPF neighbourship issue when there is a interface MTU mismatch.
Hi Shreeram,
IP can fragment OSPF packets.
Here's two routers, R1 and R2 with both MTU 1500 on the ethernet interface between them.
R1 has many OSPF-enabled interfaces, so that the Router LSA of R1 becomes bigger than 1500 bytes.
R1#show ip int et 0/0
Ethernet0/0 is up, line protocol is up
Internet address is 10.1.1.1/24
Broadcast address is 255.255.255.255
Address determined by setup command
MTU is 1500 bytes <<<<<<
A capture on the wire when OSPF exchanges the router LSA of R1 shows:
Frame 37 (1514 bytes on wire, 1514 bytes captured)
Ethernet II, Src: aa:bb:cc:00:01:00, Dst: 01:00:5e:00:00:05
Internet Protocol, Src Addr: 10.1.1.1 (10.1.1.1), Dst Addr: 224.0.0.5 (224.0.0.5)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00)
Total Length: 1500 <<<<<<
Identification: 0x0184 (388) <<<<<<
Flags: 0x02
.0.. = Don't fragment: Not set
..1. = More fragments: Set <<<<<<
Fragment offset: 0
Time to live: 1
Protocol: OSPF IGP (0x59)
Header checksum: 0xa67e (correct)
Source: 10.1.1.1 (10.1.1.1)
Destination: 224.0.0.5 (224.0.0.5)
Open Shortest Path First
OSPF Header
OSPF Version: 2
Message Type: LS Update (4)
Packet Length: 1528 <<<<<<
Source OSPF Router: 10.100.1.1 (10.100.1.1)
Area ID: 0.0.0.0 (Backbone)
Packet Checksum: 0xf490
Auth Type: Null
Auth Data (none)
[Unreassembled Packet: OSPF]
Frame 38 (82 bytes on wire, 82 bytes captured)
Ethernet II, Src: aa:bb:cc:00:01:00, Dst: 01:00:5e:00:00:05
Internet Protocol, Src Addr: 10.1.1.1 (10.1.1.1), Dst Addr: 224.0.0.5 (224.0.0.5)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6; ECN: 0x00)
Total Length: 68
Identification: 0x0184 (388) <<<<<<
Flags: 0x00
.0.. = Don't fragment: Not set
..0. = More fragments: Not set
Fragment offset: 1480
Time to live: 1
Protocol: OSPF IGP (0x59)
Header checksum: 0xcb5d (correct)
Source: 10.1.1.1 (10.1.1.1)
Destination: 224.0.0.5 (224.0.0.5)
Data (48 bytes)
0000 0a c8 01 03 ff ff ff ff 03 00 00 01 0a c8 01 02 ................
0010 ff ff ff ff 03 00 00 01 0a c8 01 01 ff ff ff ff ................
0020 03 00 00 01 0a 64 01 01 ff ff ff ff 03 00 00 01 .....d..........
The router LSA of R1 is bigger than 1500 bytes and was fragmented by IPv4.
The router LSA of R1 will be stored on R2. We can see that the size of the LSA is bigger than 1500 bytes.
R2#show ip ospf database router 10.100.1.1
OSPF Router with ID (10.100.1.2) (Process ID 1)
Router Link States (Area 0)
LS age: 4
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 10.100.1.1
Advertising Router: 10.100.1.1
LS Seq Number: 80000022
Checksum: 0x2CF4
Length: 1536 <<<<<<
Number of Links: 126
The difference between OSPF and ISIS is that OSPF runs on top of IP, while ISIS runs directly on Layer 2.
ISIS builds one LSP per level per router. With OSPF, IP can fragment the packet.
The OSPF LSA header does have a checksum field. The re-assembled LSA can be verified.
RFC 2328:
4.3. Routing protocol packets
The OSPF protocol runs directly over IP, using IP protocol 89.
OSPF does not provide any explicit fragmentation/reassembly
support. When fragmentation is necessary, IP
fragmentation/reassembly is used. OSPF protocol packets have
been designed so that large protocol packets can generally be
split into several smaller protocol packets. This practice is
recommended; IP fragmentation should be avoided whenever
possible.
A.1 Encapsulation of OSPF packets
OSPF does not define a way to fragment its protocol packets, and
depends on IP fragmentation when transmitting packets larger than
the network MTU. If necessary, the length of OSPF packets can be up
to 65,535 bytes (including the IP header).
The issues with the OSPF adjacency not forming is related to a mismatch in MTU settings or another problem with the MTU.
Either the MTU is set differently on either side of the link or there is a Layer 2 device in the middle with a lower MTU than what the routers have on the interface.
In the example above, the router LSA of R1 is fragmented, but the OSPF adjacency forms fine.
I hope this clarifies things.
Thanks,
Luc
Hello Luc,
Many thanks for the detailed explaination. My confusion was if the type 1 and type 2 LSAs can be fragmented or not as I was looking at the way ospf packet can identify the fragments. Your outputs seem to precisely clarify this.:)
Thank you once again for the explaination.
Regards,
Shreeram
Hi Luc
Thanks for this document, it is very clear and precise in his explanation.
I tested in the laboratory and could repeat several times the failure.
Best regard
Christian
Hi,
To resolve this issue.
use the command under ospf process " Ip ospf mtu-ignore" on the router that is having lower mut set on the interface.
Regards
Shashi
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: