ā03-14-2013 01:46 PM - edited ā02-21-2020 06:45 PM
Hello,
I am looking at a problem that looks to exist with a DMVPN deployment over a SP MPLS cloud. All the routers in question are ISR G2 with the majority of spokes being 1941 running IOS15
Problem
The original reported problem was poor performance started between two spoke sites when users accessed services out of one of the spokes. When we compared the traffic before and after the fault appeared the average MTU sizes seemed to drop significantly for traffic between the two sites.
This then started some investigations into the tunnel setting (which havenāt changed recently) and some troubleshooting steps. What I found was that when I did a ping between the spoke sites (tried this with a number of spoke->spoke sites) with larger MTU (1400 Bytes and 1340 Bytes) before the spoke->spoke tunnel is formed the pings are successful however as soon at the tunnel forms they fail;
Spoke1#sh dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================
Interface: Tunnel101, IPv4 NHRP Details
Type:Spoke, NHRP Peers:3,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 xx.xx.xx.1 yy.yy.yy.1 UP 4w1d S
1 xx.xx.xx.2 yy.yy.yy.2 UP 4w1d S
1 xx.xx.xx.123 yy.yy.yy.123 UP 4w1d S
Spoke1#ping ip <IP in spoke2 LAN> size 1400
Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to <IP in spoke2 LAN>, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 80/91/100 ms
Spoke1#ping ip <IP in spoke2 LAN> size 1400
Type escape sequence to abort.
Sending 5, 1400-byte ICMP Echos to <IP in spoke2 LAN>, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Spoke1#sh dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================
Interface: Tunnel101, IPv4 NHRP Details
Type:Spoke, NHRP Peers:4,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 xx.xx.xx.1 yy.yy.yy.1 UP 4w1d S
1 xx.xx.xx.2 yy.yy.yy.2 UP 4w1d S
1 xx.xx.xx.3 yy.yy.yy.3 UP 00:00:15 D
1 xx.xx.xx.123 yy.yy.yy.123 UP 4w1d S
However whenever I do the same test to any of the hub sites the ping always works. And intrestingly the same performance issues are not see on the client to server communications when initiated from the data center (hub).
The largest ping that will work on all the spoke to spokes once the dynamic tunnel is setup is 1318 bytes.
Question
A couple of things spring to mind;
1) Would this be expected behaviour to see a difference between spoke->spoke compared to spoke->hub when pinging with larger packets?
2) Useful commands to help troubleshoot (I have run through a fair few in the Cisco docs)?
Any guidance, tips etc woudl be greatfully recieved.
Tunnel Configuration
interface Tunnel101
description HUB-1 Tunnel
bandwidth 20000
ip address xx.xx.xx.1 255.255.255.0
no ip redirects
no ip unreachables
no ip proxy-arp
ip mtu 1400
ip hello-interval eigrp 1 30
ip hold-time eigrp 1 180
no ip next-hop-self eigrp 1
no ip split-horizon eigrp 1
ip nhrp authentication #####
ip nhrp map multicast dynamic
ip nhrp network-id 101
ip nhrp holdtime 360
ip tcp adjust-mss 1360
delay 1000
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 101
tunnel protection ipsec profile ######
interface Tunnel101
description Spoke Tunnel
bandwidth 1000
ip address yy.yy.yy.3 255.255.255.0
no ip redirects
ip mtu 1400
no ip next-hop-self eigrp 1
no ip split-horizon eigrp 1
ip nhrp authentication #######
ip nhrp map yy.yy.yy.1 xx.xx.xx.1
ip nhrp map yy.yy.yy.2 xx.xx.xx.2
ip nhrp map multicast xx.xx.xx.1
ip nhrp map multicast xx.xx.xx.2
ip nhrp map yy.yy.yy.123 xx.xx.xx.123
ip nhrp map multicast xx.xx.xx.123
ip nhrp network-id 101
ip nhrp holdtime 360
ip nhrp nhs yy.yy.yy.1
ip nhrp nhs yy.yy.yy.2
ip nhrp nhs yy.yy.yy.123
ip tcp adjust-mss 1360
delay 1000
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 101
tunnel protection ipsec profile ########
Solved! Go to Solution.
ā03-15-2013 06:41 AM
Rob,
If the MTU on the path between spokes is lower than on the way to DC I think we have the spoke-to-spoke over DMVPN explained.
1400 bytes - 28 bytes for GRE - overhead of IPsec = 1318 bytes
I can tell you more if you tell me what transofrm sets you are using and whether it's mode tunnel, or mode transport (the latter being recommended).
ā03-14-2013 02:18 PM
Rob,
A few things to raise:
a) If you ping between tunnel source inetrafces with DF-bit set what is the maximum size of packet betwen the two spokes.
b) When the spoke to spoke tunnel is established, are you sure it's taking the direct path (test from both sides separately), do a traceroute from one spoke to other spoke's LAN.
To answer your question, there should not be a need for differentiate behavior spoke-to-spoke comapring to spoke-to-hub IF same conditions on the path apply.
We still rely on ping (with DF-bit just in case) and traceroute for those things + show ip nhrp
I would also recommend that you start using "tunnel path-mtu-discovery".
There's also a few bugs relating to MTU discovery etc. But if you're on something fairly recent you should be OK.
vide:
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtq09372
M.
ā03-14-2013 02:31 PM
Hi Marcin,
Thanks for the responce.
a) The testing I did involved a combination of setting the DF bit and not setting it (probably should have mentioned that ) and the pattern seemed to be the same. The max MTU was also 1318, however I will double check that in the morning when I get back to site.
b) I carried out the traceroutes to confirm the path and it did indeed go the direct path.
Thats good that my understanding is the same as yours when it comes to spoke->hub and spoke->spoke behaviour.
I will have a look at the command to get a better understanding of it and review the bug id.
Rob
ā03-14-2013 02:38 PM
Rob,
it's desirable most of the time, in case MTU changes somewhere.
Something does not add up, indeed, if you could ping with DFbit size from tunnel source to tunnel souce, that's aleady a cause for alarm - typically you would have just above 1500bytes MTU throughout entire MPLS network.
If you ping between tunnel assigned IP addresess, you can give a shot to run a "debug ip packe det ACL" you will see the actual processing (since packets will be source or destined to local addresses). You can check if the behave differently on the router itself based on packet size - they should not.
It's hard to give comprehensive troubleshooting guide, part of it comes from intepreting basic things like ping/traceroute with diffretent size, part will come from CEF etc (if it comes to it).
Coincidentally, IPsec SA will also keep path MTU, you might want to check what is the IPsec SA's MTU associated with NBMA IP address of spoke (both ways).
M.
ā03-14-2013 03:09 PM
Thanks agian, you have given me some more food for thought and also made me feel better that I am probably using the correct troubleshooting mechnsims for this and not missing some key point etc.
ā03-15-2013 02:01 AM
Rob,
For all it's worth, after 5 years of troubleshooting different VPNs I'm still learning new things, you're definitely on the right track. There's hardly any magic to troubleshooting those things, until you get down to the data path, at least for me.
M.
ā04-03-2013 01:21 PM
For what its worth :), don't disable ip unreachables on your tunnel interfaces.
ā03-15-2013 03:58 AM
Marcin,
I have just double checked the path it takes and when via the hub in the DC I get a responce however when via the created tunnel it drops.
The max ping MTU I get for Tunnel IP->Tunnel IP between the spokes are (with DF bit set);
When pinging the hubs (3 in total) its 1400. However any of the Dynamic tunnels between spokes all see to show the same behaviour of a max 1318 Bytes (well the ones ive checked anyway.
ā03-15-2013 04:03 AM
Rob,
before we get down to forwarding, are you absolutely sure you pinged also from tunnel source to tunnel source (i.e. ping to/from loopback0 as per the config you ppasted above).
M.
ā03-15-2013 04:41 AM
Ah, sorry I was doing Tunnel101 IP -> Tunnel101 IP rather than Loop0->Loop0 Doh!
So when I do Loop0 ->Loop0 (with DF bit set);
Im just working through some of the traceroutes for for the Loopbacks
ā03-15-2013 06:41 AM
Rob,
If the MTU on the path between spokes is lower than on the way to DC I think we have the spoke-to-spoke over DMVPN explained.
1400 bytes - 28 bytes for GRE - overhead of IPsec = 1318 bytes
I can tell you more if you tell me what transofrm sets you are using and whether it's mode tunnel, or mode transport (the latter being recommended).
ā03-17-2013 11:04 AM
Marcin,
That makes sense, the config transform set currenly configured is;
crypto ipsec transform-set AES128 ah-sha-hmac esp-aes
mode transport
ā03-17-2013 11:18 AM
Rob,
Well my calculation seem a bit odd :-)
(for packet size of 1318 - assming no tunnel key).
24 bytes GRE header
4 bytes SPI (ESP header)
4 bytes Sequence (ESP Header)
16 byte IV (IOS ESP-AES)
12 byte pad (ESP-AES 128 bit)
1 byte Pad length (ESP Trailer)
1 byte Next Header (ESP Trailer)
1 byte Next Header (AH Header)
1 byte Payload Length (AH Header)
2 byte reserved (AH Header)
4 byte SPI (AH Header)
4 byte Sequence (AH Header)
M.
ā03-18-2013 12:19 PM
Hi Marcin,
Thanks for your guidance on this one, I thought i would pd provide a quick update.
I ended up reducing the MTU and MSS settings under the Tunnel interface to take into account the lower MTU allowed over the service provider cloud. I also enabled tunnel path-mtu-discovery.
This cleared up the issues experienced by the the spokes the configuration was altered. Now to get to the bottom of what changed in the SP land
ā03-19-2013 11:00 AM
As another question around somthing that is confusing me at the moment (until I have maybe had some sleep)... the hub routers have MTU 1400 configured on the tunnel interface and on the spoke sites that are now working with 1300 MTU when they talk to a large spoke site (that is hosting WWW service) configured at 1400 responce's are good. When the Spoke site hosting WWW service is dropped to 1300 (in an attempt to standardize the spoke config.) the spoke to spoke performance is very slow.
i.e.
Spoke (1300 MTU) -> Spoke (1400 MTU)
DC (1400 MTU) -> Spoke (1400 MTU)
Spoke (1300 MTU) -> Spoke (1300 MTU)
DC (1400 MTU) -> Spoke (1300 MTU)
I have started looking at packet captures to see if that can help guide me further.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide