cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
898
Views
0
Helpful
5
Replies

DMVPN Issues on Cisco IOS XE 3.16.03.S.155-3.S3 - Issue found to be NTP

ktwilli888
Level 1
Level 1

All,

Be aware of running DMVPN on the new "Safe harbor" release of IOS XE 3.16.03.S.155-3.S3. I recently upgraded the IOS on all of my routers from 03.16.02.S.155-3.S2. At approximately 58 hours on the code, all spoke sites started losing their spoke-to-spoke connectivity and then experienced full site outages.

The only traffic that could pass from the hub-to-spoke was traffic sourced from the NBMA address, physical interface, or the loopback interface sourcing the DMVPN tunnel. Spoke-to-hub traffic failed.

After a reload, the sites would return to operational and then fail after another 58 hours or so. I then performed a IOS rollback and left one site on 3.16.3 (after changing some timer configs for troubleshooting). The site failed at approximately, 62 hours. 

We have a TAC case that is still open for the last two weeks in which they built our DMVPN backbone in a lab yesterday. 

TAC did identify that after a certain period the spoke router was not processing the NHRP registration reply from the Hub. The Hub sent the reply, but the spoke did not see the reply when we were debugging the issue.

I will keep everyone posted. If you have questions, feel free to ask. Just wanted to post this in the event someone else is having the issue. I could not find any boards on the topic.

Ken 

5 Replies 5

pavel.skovajsa
Level 1
Level 1

Hello Ken,

did you get anywhere with this one?

Pavel,

I apologize for the the delayed response. We have been continuing to work this issue. TAC did notice that the interface buffers on the tunnel interface were clogged (376/375). He was able to dig up a Cisco bug with code 3.16.3S and it cannot process NTP packets properly. 

CSCva35619

I was stunned that it was an NTP issue on the code version, and even more stunned that this code version is considered Safe Harbor. I removed the "NTP server x.x.x.x" configurations on the router and it remained operational. I was only able to test this for 3 days because the were scheduling this site for decommission. The interface buffers were still clear at time of circuit deactivation. 

I would NOT go to this code version. 

Again, My apologies for the delay.

Kenneth

Caveat,

I was actually lucky that I was using DMVPN. The NTP traffic was sourced from the tunnel interface instead of the physical external interface. This is why I could still reach the router's NBMA and physical interfaces, because it was the buffer on the tunnel interface that was getting clogged. 

Ken

Thanks for the feedback. Coincidentally we were running the same release so I was worried. Turns out we have "ntp access-group peer ACL#"  command, so that probably saved us.

-pavel

Pavel,

This is what our interface input queue looked like. It just took 50 hours or so to fill the queue. We had 2033 drops because we were able to test with this site before we decommissioned it. 

Input queue: 376/375/2033/0 (size/max/drops/flushes); Total output drops: 0

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: