03-14-2013 01:46 PM - edited 02-21-2020 06:45 PM
Hello,
I am looking at a problem that looks to exist with a DMVPN deployment over a SP MPLS cloud. All the routers in question are ISR G2 with the majority of spokes being 1941 running IOS15
Problem
The original reported problem was poor performance started between two spoke sites when users accessed services out of one of the spokes. When we compared the traffic before and after the fault appeared the average MTU sizes seemed to drop significantly for traffic between the two sites.
This then started some investigations into the tunnel setting (which haven’t changed recently) and some troubleshooting steps. What I found was that when I did a ping between the spoke sites (tried this with a number of spoke->spoke sites) with larger MTU (1400 Bytes and 1340 Bytes) before the spoke->spoke tunnel is formed the pings are successful however as soon at the tunnel forms they fail;
Spoke1#sh dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================
Interface: Tunnel101, IPv4 NHRP Details
Type:Spoke, NHRP Peers:3,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 xx.xx.xx.1 yy.yy.yy.1 UP 4w1d S
1 xx.xx.xx.2 yy.yy.yy.2 UP 4w1d S
1 xx.xx.xx.123 yy.yy.yy.123 UP 4w1d S
Spoke1#ping ip <IP in spoke2 LAN> size 1400
Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to <IP in spoke2 LAN>, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 80/91/100 ms
Spoke1#ping ip <IP in spoke2 LAN> size 1400
Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to <IP in spoke2 LAN>, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Spoke1#sh dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================
Interface: Tunnel101, IPv4 NHRP Details
Type:Spoke, NHRP Peers:4,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 xx.xx.xx.1 yy.yy.yy.1 UP 4w1d S
1 xx.xx.xx.2 yy.yy.yy.2 UP 4w1d S
1 xx.xx.xx.3 yy.yy.yy.3 UP 00:00:15 D
1 xx.xx.xx.123 yy.yy.yy.123 UP 4w1d S
However whenever I do the same test to any of the hub sites the ping always works. And intrestingly the same performance issues are not see on the client to server communications when initiated from the data center (hub).
The largest ping that will work on all the spoke to spokes once the dynamic tunnel is setup is 1318 bytes.
Question
A couple of things spring to mind;
1) Would this be expected behaviour to see a difference between spoke->spoke compared to spoke->hub when pinging with larger packets?
2) Useful commands to help troubleshoot (I have run through a fair few in the Cisco docs)?
Any guidance, tips etc woudl be greatfully recieved.
Tunnel Configuration
interface Tunnel101
description HUB-1 Tunnel
bandwidth 20000
ip address xx.xx.xx.1 255.255.255.0
no ip redirects
no ip unreachables
no ip proxy-arp
ip mtu 1400
ip hello-interval eigrp 1 30
ip hold-time eigrp 1 180
no ip next-hop-self eigrp 1
no ip split-horizon eigrp 1
ip nhrp authentication #####
ip nhrp map multicast dynamic
ip nhrp network-id 101
ip nhrp holdtime 360
ip tcp adjust-mss 1360
delay 1000
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 101
tunnel protection ipsec profile ######
interface Tunnel101
description Spoke Tunnel
bandwidth 1000
ip address yy.yy.yy.3 255.255.255.0
no ip redirects
ip mtu 1400
no ip next-hop-self eigrp 1
no ip split-horizon eigrp 1
ip nhrp authentication #######
ip nhrp map yy.yy.yy.1 xx.xx.xx.1
ip nhrp map yy.yy.yy.2 xx.xx.xx.2
ip nhrp map multicast xx.xx.xx.1
ip nhrp map multicast xx.xx.xx.2
ip nhrp map yy.yy.yy.123 xx.xx.xx.123
ip nhrp map multicast xx.xx.xx.123
ip nhrp network-id 101
ip nhrp holdtime 360
ip nhrp nhs yy.yy.yy.1
ip nhrp nhs yy.yy.yy.2
ip nhrp nhs yy.yy.yy.123
ip tcp adjust-mss 1360
delay 1000
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 101
tunnel protection ipsec profile ########
Solved! Go to Solution.
03-20-2013 03:20 AM
Rob,
All I can offer is divide and conquer type of guidence at the moment.
(I assume underneath that NHRP shortctus and routing is correct)
Check whta protocols are affected:
- Are both UDP and TCP affected?
- Are only big transfers affected (typically over TCP, with rare exception on UDP).
- What is that protocol/application sensible to (delay, fragmentation, packet loss)
If it's mostly TCP, you might want to lower the MSS too, or check what values are passed between hosts in the packet traces.
Check delay (ICMP and UDP pings are avalable on IOS) and fragmentation (show ip traffic -> gives you a global viiew of fragmetations and reassmablies) and packet loss (beging with interface stats).
Verify on both transport and overlay network.
Compare traceroutes for working and non-working scenarios (overlay and transport network)
Sniffer is definitely the way to go to see where the problem is coming from.
M.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide