Solved: Re: After ISP 4 minute flap, why did the DMVPN tunnel stay offline?

jmaxwellUSAF · ‎07-21-2023

Hello.

Our spoke ISP failed for 4 minutes. Once it returned, our

C2921 DMVPN

spoke remained in the NHRP state (no production traffic flowed from this spoke to the hub. This was remediated by shutting then no-shutting the spoke

tunnel interface

(It was fortunate I was physically at the branch spoke to fix this.)

QUESTIONS:

1. Why did this spoke not return to "UP" status when the ISP link became healthy?

2. What can be done so that this symptom does not re-occur?

Thank you.

Peter Paluch · ‎07-21-2023

Hello,

It's better not to "simplify" scenarios during troubleshooting like this because details like these may change the whole story completely.

Initially, assuming that all you did was to flap the

Tunnel interface

and did not perform further config changes, I suspected the NHRP registration interval to have collided in an unfortunate way with the ISP outage. Since it is 200 seconds by default, if the registration fails, it will take up to 200 seconds for the router to register to the hub again. That could have explained it if the registration fell into the 4-minute outage of the ISP (and if

Gi1/0

didn't go down which I only learned when you shared the logs). Very importantly, if the

Tunnel2

came up just by flapping it but keeping

Gi1/0

as the source interface, it would have confirmed that the internet connectivity through

Gi1/0

worked after the ISP came back.

However, you have changed the source interface and only then flapped the Tunnel interface. This means that we can not assume anything about the apparently restored connectivity through the ISP on

Gi1/0

For what it is worth, just because

Gi1/0

came back up does not mean that the internet was actually reachable through it.

So based on the fact that the

Tunnel2

became operable after you changed the source interface opens a whole set of questions on how the connectivity was restored, if at all, through

Gi1/0

It is not possible to say with certainty whether the problem was NHRP or the connectivity through

Gi1/0

I suspect for now that when

Gi1/0

came up, it still did not provide connectivity through that ISP to internet. Why would that be the case - that's something I can't say without seeing the full

show logging and full show running-config

because there are too many unknowns, and we cannot afford assuming.

All depends now on whether it is possible to share the following full outputs (no line may be removed, only sensitive data replaced with safe placeholders):

- show logging
- show running-config
- show ip interface brief
- show ip protocols
- show ip route
- show ip route vrf *
- show ip arp

The reason I am asking for this information is that I need to understand what is the momentary runtime state of this router, whether it appears to have at least a local connectivity to the ISP, and how is the routing set up on it. Changing the source interface on

Tun2

would have changed the source IP but not the outgoing interface itself - the outgoing interface is still determined by the routing table based on the destination IP address of the packet, not by the

tunnel source

command.

If those outputs cannot be shared, I'm afraid this is as far as we can get.

Best regards,
Peter

View solution in original post

Peter Paluch · ‎07-21-2023

Hello,

May I ask a few questions to clarify the issue?

1. What exactly was down after the ISP recovered? Was the

Tunnel interface "down, line protocol down" or "up, line protocol down"

? Or was it some NHRP state that was unresolved or empty? Let's be very precise regarding this.

2. Was it purely the

shut / no shut

on the

Tunnel interface

that resolved this issue?

3. Would it be possible for you to share the

show logging

from the spoke

C2921

including a few lines before, all lines during, and a few lines after the event with the ISP?

4. Would it be possible for you to share a sanitized but still full configuration of the

Tunnel interface

on the spoke

C2921

?

Thank you!

Best regards,
Peter

jmaxwellUSAF · ‎07-21-2023

(obfuscated)
!! Discussed tunnel is tunnel2 !!
Jul 21 13:56:02.768: %CRYPTO-4-IKMP_BAD_MESSAGE: IKE message from 18.179.50.34 failed its sanity check or is malformed
*Jul 21 13:58:02.728: %CRYPTO-4-IKMP_BAD_MESSAGE: IKE message from 18.179.50.34 failed its sanity check or is malformed
!! 18.179.50.34 is the public IP address of a spoke, probably connected to this spoke via spoke to spoke Tunnel2. !!

!! below-- ISP fails... !!
*Jul 21 13:58:10.000: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0, changed state to down
*Jul 21 13:58:11.000: %LINK-3-UPDOWN: Interface GigabitEthernet1/0, changed state to down
*Jul 21 13:58:11.408: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel3, changed state to down
*Jul 21 13:58:11.408: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel2, changed state to down
*Jul 21 13:58:11.408: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 192.168.3.1 (Tunnel3) is down: interface down
*Jul 21 13:58:11.412: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 192.168.12.1 (Tunnel2) is down: interface down

*Jul 21 14:02:14.000: %LINK-3-UPDOWN: Interface GigabitEthernet1/0, changed state to up
*Jul 21 14:02:15.000: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0, changed state to up
*Jul 21 14:02:21.648: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel3, changed state to up
*Jul 21 14:02:21.648: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel2, changed state to up
!! Tunnel is stuck in NHRP state. !!

!! below is troubleshooting commenced. int tu20 is #shut !!
*Jul 21 14:19:52.316: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel2, changed state to down
*Jul 21 14:19:52.316: %LINK-5-CHANGED: Interface Tunnel2, changed state to administratively down

!! I didnt include this in discussion because I didnt want to complicate discussion, but fact is that at this moment I switched tunnel interface from g0/1 to the backup ISP at g0/4, then executed on tu2 #no shut !!
*Jul 21 14:20:17.896: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel2, changed state to up
*Jul 21 14:20:17.896: %LINK-3-UPDOWN: Interface Tunnel2, changed state to up
*Jul 21 14:20:21.648: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 192.168.12.1 (Tunnel2) is up: new adjacency

*Jul 21 14:23:05.644: %CRYPTO-4-IKMP_BAD_MESSAGE: IKE message from 18.179.50.34 failed its sanity check or is malformed
*Jul 21 14:24:05.696: %CRYPTO-4-IKMP_BAD_MESSAGE: IKE message from 18.179.50.34 failed its sanity check or is malformed

---

(obfuscated)
#sh int tu2
Tunnel2 is up, line protocol is up
Hardware is Tunnel
Description: Spoke1
Internet address is 192.168.12.26/24
MTU 17912 bytes, BW 250000 Kbit/sec, DLY 20000 usec,
reliability 255/255, txload 2/255, rxload 1/255
Encapsulation TUNNEL, loopback not set
Keepalive not set
Tunnel linestate evaluation up
Tunnel source 1.2.3.4 (GigabitEthernet0/4)
Tunnel Subblocks:
src-track:
Tunnel2 source tracking subblock associated with GigabitEthernet0/4
Set of tunnels with source GigabitEthernet0/4, 1 member (includes iterators), on interface <OK>
Tunnel protocol/transport multi-GRE/IP
Key 0x14, sequencing disabled
Checksumming of packets disabled
Tunnel TTL 255, Fast tunneling enabled
Tunnel transport MTU 1472 bytes
Tunnel transmit bandwidth 8000 (kbps)
Tunnel receive bandwidth 8000 (kbps)
Tunnel protection via IPSec (profile "myprofile")
Last input 00:00:01, output never, output hang never
Last clearing of "show interface" counters 29w3d
Input queue: 0/75/96/0 (size/max/drops/flushes); Total output drops: 7527297
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 1350000 bits/sec, 408 packets/sec
5 minute output rate 2741000 bits/sec, 453 packets/sec
473313393 packets input, 1659536957 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
1155234215 packets output, 1291688580 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out

MHM Cisco World · ‎07-21-2023

First since you change tunnel source try

Clear

nhrp

in hub

Clear

crypto isakmp

and clear

crypto ipsec sa

in hub and spoke

If above not work share config of spoke

jmaxwellUSAF · ‎07-21-2023

Thank you MHM.

The situation is now healthy. The reason for the post is that my boss does not want this to happen again, so I am searching for understanding to why it happened. I cannot execute any new configs right now.

Peter Paluch · ‎07-21-2023

Hello,

It's better not to "simplify" scenarios during troubleshooting like this because details like these may change the whole story completely.

Initially, assuming that all you did was to flap the

Tunnel interface

and did not perform further config changes, I suspected the NHRP registration interval to have collided in an unfortunate way with the ISP outage. Since it is 200 seconds by default, if the registration fails, it will take up to 200 seconds for the router to register to the hub again. That could have explained it if the registration fell into the 4-minute outage of the ISP (and if

Gi1/0

didn't go down which I only learned when you shared the logs). Very importantly, if the

Tunnel2

came up just by flapping it but keeping

Gi1/0

as the source interface, it would have confirmed that the internet connectivity through

Gi1/0

worked after the ISP came back.

However, you have changed the source interface and only then flapped the Tunnel interface. This means that we can not assume anything about the apparently restored connectivity through the ISP on

Gi1/0

For what it is worth, just because

Gi1/0

came back up does not mean that the internet was actually reachable through it.

So based on the fact that the

Tunnel2

became operable after you changed the source interface opens a whole set of questions on how the connectivity was restored, if at all, through

Gi1/0

It is not possible to say with certainty whether the problem was NHRP or the connectivity through

Gi1/0

I suspect for now that when

Gi1/0

came up, it still did not provide connectivity through that ISP to internet. Why would that be the case - that's something I can't say without seeing the full

show logging and full show running-config

because there are too many unknowns, and we cannot afford assuming.

All depends now on whether it is possible to share the following full outputs (no line may be removed, only sensitive data replaced with safe placeholders):

- show logging
- show running-config
- show ip interface brief
- show ip protocols
- show ip route
- show ip route vrf *
- show ip arp

The reason I am asking for this information is that I need to understand what is the momentary runtime state of this router, whether it appears to have at least a local connectivity to the ISP, and how is the routing set up on it. Changing the source interface on

Tun2

would have changed the source IP but not the outgoing interface itself - the outgoing interface is still determined by the routing table based on the destination IP address of the packet, not by the

tunnel source

command.

If those outputs cannot be shared, I'm afraid this is as far as we can get.

Best regards,
Peter

MHM Cisco World · ‎07-22-2023

this UP healthy status

when you change the ISP (without config no unique register under tunnel) the Hub receive NHRP same Spoke tunnel IP but different tunnel source IP this make Hub refuse the NHRP and in Spoke tunnel you see status NHRP.
solution

ip nhrp registration non-unique

NOTE:- above is command is only used if you have Spoke get it IP via ISP (DHCP or PPP), for your case since you change ISP IP one times no need only clear NHRP in hub as I mention above.

MHM Cisco World · ‎07-24-2023

Note:- in my lab I change ISP ip and you can see the status is change from UP to NHRP.

jmaxwellUSAF · ‎07-24-2023

If you go to ISP router device, shut interface for four minutes, then no-shut...

what is result of tunnel status on spoke DMVPN device?

MHM Cisco World · ‎07-24-2023

As I mention if you get IP via DHCP ftom ISP and the IP change then this case can happened if you dont config register non unique

jmaxwellUSAF · ‎07-24-2023

IPs are static. ISP routing device sits in cage, we cannot log into it, but we know its inside public IP address, and our DMVPN has public IP address also in this subnet.

As described, When ISP came back online 4 mins later, tunnel was stuck in NHRP state.

MHM Cisco World · ‎07-24-2023

IP of tunnel source is static?

jmaxwellUSAF · ‎07-24-2023

yes, static public.

MHM Cisco World · ‎07-24-2023

If Spoke have static IP' arw you config bgp flapping prevent or track

ip sla

?

jmaxwellUSAF · ‎07-25-2023

ISP routing device in cage is not administrated by us. It has public IP address on its inside interface connected to a l2 switch, which connects to our network.