cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3870
Views
20
Helpful
15
Replies

Spoke to Spoke DMVPN behavior

redwards
Level 1
Level 1

Hello,

I am looking at a problem that looks to exist with a DMVPN deployment over a SP MPLS cloud. All the routers in question are ISR G2 with the majority of spokes being 1941 running IOS15

Problem

The original reported problem was poor performance started between two spoke sites when users accessed services out of one of the spokes. When we compared the traffic before and after the fault appeared the average MTU sizes seemed to drop significantly for traffic between the two sites.

This then started some investigations into the tunnel setting (which haven’t changed recently) and some troubleshooting steps. What I found was that when I did a ping between the spoke sites (tried this with a number of spoke->spoke sites) with larger MTU (1400 Bytes and 1340 Bytes) before the spoke->spoke tunnel is formed the pings are successful however as soon at the tunnel forms they fail;

Spoke1#sh dmvpn

Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete

        N - NATed, L - Local, X - No Socket

        # Ent --> Number of NHRP entries with same NBMA peer

        NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting

        UpDn Time --> Up or Down Time for a Tunnel

==========================================================================

Interface: Tunnel101, IPv4 NHRP Details

Type:Spoke, NHRP Peers:3,

# Ent  Peer NBMA Addr Peer Tunnel Add State  UpDn Tm Attrb

----- --------------- --------------- ----- -------- -----

     1    xx.xx.xx.1     yy.yy.yy.1    UP     4w1d     S

     1    xx.xx.xx.2     yy.yy.yy.2    UP     4w1d     S

     1  xx.xx.xx.123   yy.yy.yy.123    UP     4w1d     S

Spoke1#ping ip <IP in spoke2 LAN> size 1400

Type escape sequence to abort.

Sending 5, 1400-byte ICMP Echos to <IP in spoke2 LAN>, timeout is 2 seconds:

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 80/91/100 ms

Spoke1#ping ip <IP in spoke2 LAN> size 1400

Type escape sequence to abort.

Sending 5, 1400-byte ICMP Echos to <IP in spoke2 LAN>, timeout is 2 seconds:

.....

Success rate is 0 percent (0/5)

Spoke1#sh dmvpn

Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete

        N - NATed, L - Local, X - No Socket

        # Ent --> Number of NHRP entries with same NBMA peer

        NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting

        UpDn Time --> Up or Down Time for a Tunnel

==========================================================================

Interface: Tunnel101, IPv4 NHRP Details

Type:Spoke, NHRP Peers:4,

# Ent  Peer NBMA Addr Peer Tunnel Add State  UpDn Tm Attrb

----- --------------- --------------- ----- -------- -----

     1    xx.xx.xx.1     yy.yy.yy.1    UP     4w1d     S

     1    xx.xx.xx.2     yy.yy.yy.2    UP     4w1d     S

     1    xx.xx.xx.3     yy.yy.yy.3    UP 00:00:15     D

     1  xx.xx.xx.123   yy.yy.yy.123    UP     4w1d     S

However whenever I do the same test to any of the hub sites the ping always works. And intrestingly the same performance issues are not see on the client to server communications when initiated from the data center (hub).

The largest ping that will work on all the spoke to spokes once the dynamic tunnel is setup is 1318 bytes.

Question

A couple of things spring to mind;

1) Would this be expected behaviour to see a difference between spoke->spoke compared to spoke->hub when pinging with larger packets?

2) Useful commands to help troubleshoot (I have run through a fair few in the Cisco docs)?

Any guidance, tips etc woudl be greatfully recieved.

Tunnel Configuration

interface Tunnel101

description HUB-1 Tunnel

bandwidth 20000

ip address xx.xx.xx.1 255.255.255.0

no ip redirects

no ip unreachables

no ip proxy-arp

ip mtu 1400

ip hello-interval eigrp 1 30

ip hold-time eigrp 1 180

no ip next-hop-self eigrp 1

no ip split-horizon eigrp 1

ip nhrp authentication #####

ip nhrp map multicast dynamic

ip nhrp network-id 101

ip nhrp holdtime 360

ip tcp adjust-mss 1360

delay 1000

tunnel source Loopback0

tunnel mode gre multipoint

tunnel key 101

tunnel protection ipsec profile ######

interface Tunnel101

description Spoke Tunnel

bandwidth 1000

ip address yy.yy.yy.3 255.255.255.0

no ip redirects

ip mtu 1400

no ip next-hop-self eigrp 1

no ip split-horizon eigrp 1

ip nhrp authentication #######

ip nhrp map yy.yy.yy.1 xx.xx.xx.1

ip nhrp map yy.yy.yy.2 xx.xx.xx.2

ip nhrp map multicast xx.xx.xx.1

ip nhrp map multicast xx.xx.xx.2

ip nhrp map yy.yy.yy.123 xx.xx.xx.123

ip nhrp map multicast xx.xx.xx.123

ip nhrp network-id 101

ip nhrp holdtime 360

ip nhrp nhs yy.yy.yy.1

ip nhrp nhs yy.yy.yy.2

ip nhrp nhs yy.yy.yy.123

ip tcp adjust-mss 1360

delay 1000

tunnel source Loopback0

tunnel mode gre multipoint

tunnel key 101

tunnel protection ipsec profile ########

15 Replies 15

Rob,

All I can offer is divide and conquer type of guidence at the moment.

(I assume underneath that NHRP shortctus and routing is correct)

Check whta protocols are affected:

- Are both UDP and TCP affected?

- Are only big transfers affected (typically over TCP, with rare exception on UDP).

- What is that protocol/application sensible to (delay, fragmentation, packet loss)

If it's mostly TCP, you might want to lower the MSS too, or check what values are passed between hosts in the packet traces.

Check delay (ICMP and UDP pings are avalable on IOS) and fragmentation (show ip traffic -> gives you a global viiew of fragmetations and reassmablies) and packet loss (beging with interface stats).

Verify on both transport and overlay network.

Compare traceroutes for working and non-working scenarios (overlay and transport network)

Sniffer is definitely the way to go to see where the problem is coming from.

M.