Re: NHRP brings DMVPN down, every two hours.

michael.leblanc · ‎12-04-2012

The ISAKMP SA lifetime for one of our DMVPNs is set to ~ 24 hours in the following ISAKMP policy:

crypto isakmp policy 3

encr 3des

group 2

lifetime 86399

Despite this, the Crytpo tunnels go down every 2 hours. They come back up after about 4 seconds.

We have performed the following:

Administratively shutdown the Crytpo tunnel interface (Tunnel0)
Cleared out the SA database
Issued the "sh ip nhrp" to verify that the NHRP database is empty
Brought the Crytpo tunnel interface (Tunnel0) up
Re-issued the "sh ip nhrp" command

NHRP-Server# sh ip nhrp

<spoke-tunnel-ip-addr>/32 via <spoke-tunnel-ip-addr>

Tunnel0 created 00:00:18, expire 01:59:41

Type: dynamic, Flags: unique registered used

NBMA address: <spoke-global-ip-addr>

...output trimmed for brevity.

Note: We see Tunnel0 was just created, and due to expire in 2 hours.

NHRP commands available on the Tunnel0 interface are as follows:

NHRP-Server(config)# int t0

NHRP-Server(config-if)# ip nhrp ?

authentication Authentication string

cache NHRP Cache related commands.

group NHRP group name

holdtime Advertised holdtime

interest Specify an access list

map Map dest IP addresses to NBMA addresses

max-send Rate limit NHRP traffic

network-id NBMA network identifier

nhs Specify a next hop server

record Allow NHRP record option

redirect Enable NHRP redirect traffic indication

registration Settings for registration packets.

responder Responder interface

server-only Disable NHRP requests

shortcut Enable shortcut switching

trigger-svc Create NHRP cut-through based on traffic load

use Specify usage count for sending requests

What determines this 2 hour interval, and where can it be configured?

Any thoughts or solutions would be welcome.

Best Regards,

Mike

Marcin Latosiewicz · ‎12-05-2012

Mike,

Holdtime by default is 7200seconds as far as I remember. You can check IOS master index to verify.

The thing you need to remember is that NHRP registration time is 1/3 of hodltime. I.e. a spoke should re-register to hub registration time.

Holdtime is expires if no registration has been done in the period of it.

M.

michael.leblanc · ‎12-05-2012

Marcin:

Thank you for your repsonse.

I've reviewed our syslog files and determined that it was only spoke-to-spoke crypto tunnels that were bouncing on a two hour interval.

However, during my observations today, I did not witness any bouncing of the spoke-to-spoke crypto tunnels.

The dynamic entries in the NHRP Server's cache (show ip nhrp) were being refreshed every 40 minutes (1/3 of the default 120 min. holdtime), as expected.

The dynamic entries in the caches of the NHRP Spokes were behaving differently. These expired every 120 min. (default holdtime) without being refreshed. I'm not sure if this behavior is normal.

ISAKMP phase I and phase II SAs were renegotiated successfully according to their configured lifetimes, independent of NHRP dynamics.

At this time, I do not know why the symptom prominently recorded in recent syslog files was not occurring today.

Your response prompted me to take a look at some of the command references. As a result, I decided to incorporate the following:

ip nhrp holdtime 1800

ip nhrp registration no-unique

ip nhrp registration timeout 300

I'll monitor the routers to see if the symptom returns, and if so, whether the interval corresponds with the new holdtime.

Thanks for your interest in my posting.

Best Regards,

Mike

Marcin Latosiewicz · ‎12-06-2012

Mike,

NHRP registrations flow only from spoke to hub. There is no need to send them between spokes.

Indeed your temporary-in-nature spoke to spoke tunnel will expire.

Let me show you an example from my lab:

172.16.0.104/32 via 172.16.0.104

Tunnel0 created 00:00:09, expire 00:09:50

Type: dynamic, Flags: router

NBMA address: 10.0.0.104

This is entry for spoke to spoke tunnel.

When it expired:

*Dec 6 08:02:44.682: %CRYPTO-5-SESSION_STATUS: Crypto tunnel is DOWN. Peer 10.0.0.104:500 Id: 10.0.0.104

(this comes from "crypto logging session" command).

Now I have also setup an SLA probe that should be going over that spoke to spoke tunnel.

Spoke_R3(config)#ip sla schedule 991 start-time now life forever

Spoke_R3(config)#

*Dec 6 08:11:24.714: %CRYPTO-5-SESSION_STATUS: Crypto tunnel is UP . Peer 10.0.0.104:500 Id: 10.0.0.104

Spoke_R3(config)#^Z

Spoke_R3#

*Dec 6 08:11:28.490: %SYS-5-CONFIG_I: Configured from console by console

Spoke_R3#sh ip nhrp

(parts removed)

172.16.0.104/32 via 172.16.0.104

Tunnel0 created 00:00:08, expire 00:09:51

Type: dynamic, Flags: router used

NBMA address: 10.0.0.104

Spoke_R3#term exec pro ti

Spoke_R3#sh ip nhrp

Load for five secs: 1%/0%; one minute: 1%; five minutes: 1%

Time source is hardware calendar, *08:21:51.594 CET Thu Dec 6 2012

(removed)

172.16.0.104/32 via 172.16.0.104

Tunnel0 created 00:10:26, expire 00:08:18

Type: dynamic, Flags: router used

NBMA address: 10.0.0.104

M.

michael.leblanc · ‎12-06-2012

Marcin:

Our DMVPN spokes are NTP peers, so we expect the spoke-to-spoke crypto tunnels to stay up, as they did today.

Best Regards,

Mike

Marcin Latosiewicz · ‎12-06-2012

Mike,

How's the NTP peering configured, are you sure it goes over spoke-to-spoke tunnel?

Traceroute between the peering IP addresses would tell you to some extent.

M.

michael.leblanc · ‎12-06-2012

As mentioned earlier in the post – “The dynamic entries in the caches of the NHRP Spokes were behaving differently. These expired every 120 min. (default holdtime), without being refreshed.”

According to the Cisco document titled "Configuring NHRP":

“Each time a data packet is switched using an NHRP mapping entry, the “used” flag is set on the mapping entry. Then when the NHRP background process runs (every 60 seconds) the following actions occur:”

The list of actions depend on the switching path (Process Switching vs. CEF Switching).

Additional reading established that the opportunity to “refresh” the mapping entry occurs when the expire time is less than or equal to 120 seconds. With the background process only taking action every 60 seconds, there is a small window of opportunity to avoid expiration when spoke-to-spoke traffic is modest.

We were counting on NTP peering between the spokes to maintain the tunnels. However, once the NTP peers have exchanged packets for a while the NTP polling interval extends well beyond the ~ 120 second window of opportunity to refresh the dynamic NHRP mapping entries.

Without doubt, the spoke-to-spoke tunnels have been coming down per the syslog entries (%CRYPTO-5-SESSION_STATUS: Crypto tunnel is DOWN.), during periods of modest, or no spoke-to-spoke traffic.

Marcin:

Per your last inquiry, I did confirm that NTP was using the crypto tunnel path on the NHRP network in question, at least during the period of observation. The SLA probe used in your earlier response seems like a sensible solution.

Thank you for your participation in this post.

Best Regards,

Mike

P.S.: Haven't forgotten about your participation in my other post titled “Installing 2nd Certificate from the same CA.” After getting a chance to explore it further, I'll get back to you.

Gurpreet Puri · ‎12-06-2012

Hi Michael,

ip nhrp registration timeout 120

This command is currently running on my configuration and my session shows "never expire".

Regards,
Gurpreet S Puri

****************************
Keep Smiling, Peace
****************************

(Please Rate Helpful Post)

Regards, Gurpreet S Puri **************************** Keep Smiling, Peace :) **************************** (Please Rate Helpful Post)

michael.leblanc · ‎12-06-2012

Gurpreet:

Thank you for the response.

I suspect the entries you are referring to would be "static" entries for the "NHRP Server", rather than "dynamic" entries for the "spokes".

Spoke-A # sh ip nhrp

<Spoke-B-Tunnel-IP-addr>/32 via , Tunnel0 created 00:23:07, expire 00:06:56

Type: dynamic, Flags: router

NBMA address: <Spoke-B-Global-IP-addr>

<NHRP-Server-Tunnel-IP-addr>/32 via , Tunnel0 created 08:54:42, never expire

Type: static, Flags: used

NBMA address: <NHRP-Server-Global-IP-addr>

Per the Cisco document "How to Configure NHRP":

ip nhrp registration timeout 120

Changes the interval that NHRP NHCs send NHRP registration requests to configured NHRP NHSs.
The NHRP registration requests are now sent every 120 seconds (default value is one third NHRP holdtime value).

Observation, with the following parameters:

ip nhrp holdtime 1800

ip nhrp registration timeout 300

The expiry timers for the dynamic (spoke) entries (as observed on the NHRP Server) count down from 30 min., to 25 min., and are reset to 30 min. when the next registration request arrives.

Marcin's last response requires some investigation. A second (non-crypto) tunnel does exist, and load sharing would be occurring. I've expected there to be enough NTP traffic to keep the crypto tunnel up, but I need to determine whether that is actually occurring.

Thank you for your interest in this post, Gurpreet.

Best Regards

Mike

Marcin Latosiewicz · ‎12-07-2012

Mike,

Cool stuff, you're actually one of the few people who took the time to READ the documentation and I know it's sometime hard to find the THE RIGHT info you need.

M.

michael.leblanc · ‎12-07-2012

Marcin:

Thank you for the response, and feedback.

Unfortunately, my last reply at Dec 7, 2012 1:15 AM, which contained the solution, and was most likely the one you intended to respond too, wasn't the one that received your generous rating, drawing attention away from it.

I thought I was doing a good thing by replying to your inquiry about the NTP peering, rather than the most recent reply in the thread. Haven't participated here for a while, so I'm not sure if there is a common practice that is adhered to.

Am I generally better off just replying to the last reply in a thread, rather than inserting it where I think it might best fit?

Also noticed that as the initiator of the post, I'm unable to mark the thread as answered, because the solution is within one of my own replies.

If you wanted to generate a reply that references the panel containing the solution (Dec 7, 2012 1:15 AM), I'll use the "Correct Answer" on it, so that the thread gets tagged accordingly.

Best Regards,

Mike

Marcin Latosiewicz · ‎12-10-2012

Mike,

I cannot take away helpful points from any post so I gave both posts a 5 star rating ;-)

Indeed it's a limitation (or an anti-abuse system) not to allow people to tag their own posts are solutions.

For the rest I think people will be able to find the solution if they can good and use "find" in their browsers.

M.