Hi Carlos. I'm not using NAT

WILLIAM STEGMAN · ‎05-13-2016

We've deployed DMVPN across our MPLS network, each branch has an EIGRP peer of a hub router at 2 data centers. Each of the hubs is an ASR 1002X running identical IOS image. One of the 2 hub ASRs consistently flaps with with all the branches. Not all branches go down at the same time, it staggers throughout the day. I've adjusted the timers on the tunnel interface to as high as 60 secs for hello and 180 for hold time, and have upped the bandwidth percentage to 500%. I created a 2nd tunnel interface, GRE only, on the problem hub and one of the branches and added that network to EIGRP and the link stayed up. I then added the DMVPN profile to the tunnel interface, and it continued to stay up. I also checked with the service provider to see if any QoS drops on CS6 that might have been accounting for the flapping, but nothing. Just in case, I began marking the EIGRP traffic in our critical data class (a CoS we have with the SP that I can monitor for drops) and there were no drops for that class. I also tried swapping the router with an RMA, but it continues to flap. So the best I can tell, this is not transport or hardware related. The other data center ASR hub is working fine, so either there is a unique combination of properties on that ASR, or there is something wrong in the EIGRP configuration behind that ASR (data center network for example) that is presenting itself on that particular ASR and not the other. I'm out of ideas and was hoping someone might have been through something similar and has a lead on what I might try next.

MTA-DMVPN-MPLS# sh run int tu61
Building configuration...

Current configuration : 737 bytes
!
interface Tunnel61
description MTA GRE/IPSEC Tunnel via MPLS to Remotes
bandwidth 102400
ip address 10.10.4.2 255.255.252.0
no ip redirects
ip mtu 1400
ip nbar protocol-discovery
no ip next-hop-self eigrp 10
no ip split-horizon eigrp 10
ip pim nbma-mode
ip pim sparse-mode
ip nhrp authentication 123456
ip nhrp map multicast dynamic
ip nhrp network-id 1234
ip nhrp holdtime 600
ip nhrp redirect
ip tcp adjust-mss 1360
tunnel source GigabitEthernet0/0/2.3233
tunnel mode gre multipoint
tunnel key 1234
tunnel vrf IWAN_MPLS
tunnel protection ipsec profile DMVPN-PROFILE-IWAN_MPLS
end

Block nets with either tag 60,61, 160, 161 from entering any DMVPN router

thank you

Francesco Molino · ‎05-13-2016

Hi

did you ran eigrp debug? I would investigate on mtu. Generally these flapping issues are due to MTU issues. Did you check that also?

Thanks

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

WILLIAM STEGMAN · ‎05-15-2016

Hello. I did run a debug on the packets. It only states that 3 hellos were missed and therefore the tunnel would be torn down. 1400 is the same MTU I have set on the other hub router, but I've tweaked the MTU to be as low as 1300 with 1260 for the MSS, but no change. I'd also mention that the hub router has an EIGRP neighbor in the data center core, and that relationship stays up. It's only an issue with the DMVPN over MPLS.

thank you

Francesco Molino · ‎05-15-2016

Could you just try 1 thing in order to view how eigrp will work?

Berween this hub and spoke , instead of dynamic mapping, could you fix mapping? Let me know the result.

could you also check on both side hello interval and hold timers?

can you run a debug ip packet on both side to view if there is a one way communication?

basic stuff but to be sure: try to ping from both side their front end Ip with a mtu size packet as you've setup. From hub ping spoke with mtu you've defined and invert.

Thanks
Francesco
PS: Please don't forget to rate and select as validated answer if this answered your question

WILLIAM STEGMAN · ‎05-16-2016

If I understand you correctly, from one branch, I removed the ip nhrp map mutlicast 10.5.0.5, leaving only ip nhrp map 10.10.4.2 10.5.0.5 under the tunnel interface. It only stayed up for a couple minutes before bouncing.

From the hub:

May 16 08:52:14.583 edt: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.10.5.200 (Tunnel61) is down: Peer Termination received

May 16 08:52:14.758 edt: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.10.5.200 (Tunnel61) is up: new adjacency

Pings to and from hub and spokes are able to reach the tunnel interface IP and the front end IP using a size of 1400.

Hello and hold times are identical on both sides, 20 and 60.

I haven't been able to successfully run a debug packet on the hub site, since as best I can tell I cannot filter that debug to a particular peer, and since we have hundreds of branches, I haven't tried. I have seen the debug from the spoke side, and it just indicates that it missed 3 consecutive hellos and initiates the tear down.

thanks

Carlos Villagran · ‎05-16-2016

Hi William,

Do you have NAT configured in the physical interface GigabitEthernet0/0/2?

Do you see drops in the interface or maybe it is greatly loaded?

Regards!

JC

WILLIAM STEGMAN · ‎05-16-2016

Hi Carlos. I'm not using NAT. Utilization is fairly low at that hub, it's primarily a backup data center, but does serve as a primary for a certain region. There are output drops on the interface, similar to the other hub site, but they are likely the result of the shaping policy. However, I've removed that policy previously as a test and it didn't resolve the issue.

Otherwise, no errors. We actually have a BGP peer with the Service Provider at both hubs using the VRF that I extended to the core in order to support service providers services, i.e. SIP trunking. It's essentially a parallel path to Service Provide networks, while the spokes networks are carried from the DMVPN hub's global routing table to the core. That peering relationship never goes down.

thanks

Carlos Villagran · ‎05-16-2016

Hi!

Can you try pinging the other side but using static routes? (Taking just a pair of routes and configuring static routes so both can ping without EIGRP flapping interference)

This would helps us to isolate if this is an issue with the EIGRP packets being dropped for some reason by the peer sending the peer termination or it is really dropping any packet traversing the tunnel.

Let me know how it went. Regards!

JC

WILLIAM STEGMAN · ‎05-16-2016

Carlos, I've put a static route in on either side for communication between a pair of hosts, 1 at a branch and 1 at the hub, but this peering is tore down and put back so quick I'm not hopeful that we'll see much in terms of drops. Here's how quick it typically is:

From the hub:

May 16 08:52:14.583 edt: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.10.5.200 (Tunnel61) is down: Peer Termination received

May 16 08:52:14.758 edt: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.10.5.200 (Tunnel61) is up: new adjacency

Carlos Villagran · ‎05-16-2016

Hi!

If you ping from host A in branch A to host B in branch B, do you see ping losses?

Regards!

JC

WILLIAM STEGMAN · ‎05-16-2016

none. Connectivity appears to be fine throughout the day despite every one of the few hundred spokes resetting their eigrp neighbor relationship. For example, today all few hundred reset between less than a minute ago and up to 3 hours and 22 minutes.

Carlos Villagran · ‎05-16-2016

Have you noticed High CPU in spoke or hub? Maybe spikes?

Regards!

JC

WILLIAM STEGMAN · ‎05-16-2016

not really, see attached.

Carlos Villagran · ‎05-16-2016

Do you see flaps in the NHRP session with your branches too?

WILLIAM STEGMAN · ‎05-17-2016

no, it looks like the nhrp info stays up, along with the tunnel. The tunnel never goes down, just the EIGRP peering resets.

sh ip nhrp

10.10.4.2/32 via 10.10.4.2
Tunnel61 created 10w1d, never expire
Type: static, Flags: used
NBMA address: 10.5.0.5

EIGRP Flapping