cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1340
Views
10
Helpful
2
Replies

DMVPN Phase3 with 4 hubs - Only 2 NHS responding

pavel.skovajsa
Level 1
Level 1

We have a fully working DMVPN phase3 where each spoke with 1 internet uplink has 4 different mGRE tunnels to 4 different Hubs (for redundancy purposes). All is working fine, all tunnels are passing traffic just fine, but for some reason always only 2 NHS are responding to NHRP req.

spoke#show ip nhrp nhs  detail 
Legend: E=Expecting replies, R=Responding, W=Waiting
Tunnel0:
172.18.128.1   E NBMA Address: 1.1.1.1 priority = 0 cluster = 0  req-sent 8  req-failed 0  repl-recv 0 (3w1d ago)

Tunnel1:
172.18.16.1  RE NBMA Address: 2.2.2.2 priority = 0 cluster = 0  req-sent 47680  req-failed 0  repl-recv 47667 (00:00:15 ago)

Tunnel2:
172.18.160.1   E NBMA Address: 3.3.3.3 priority = 0 cluster = 0  req-sent 153533  req-failed 0  repl-recv 0 

Tunnel3:
172.18.208.1  RE NBMA Address: 4.4.4.4 priority = 0 cluster = 0  req-sent 47604  req-failed 0  repl-recv 47602 (00:00:39 ago)

Pending Registration Requests:
Registration Request: Reqid 134, Ret 64  NHS 172.18.160.1 expired (Tu2) 
Registration Request: Reqid 23618, Ret 64  NHS 172.18.128.1 expired (Tu0) 

Interesting enough, when we shutdown Tun1, and shut/unshut Tun0, suddenly Tun0 NHS starts responding. At the end there are always max. 2 NHS which are responding. This is definitely not what we would expect in normal situation. Interesting enough, we can reproduce the same behavior in the lab. Do you think this is "normal"?

Debugging the NHRP NHS packets is also very interesting - it shows that the NHS actually responds but the response packet comes to wrong mGRE tunnel!

Here is a debug from a "working" Tunnel interface:

Aug 31 15:18:54.214 GMT: NHRP: Send Registration Request via Tunnel1 vrf MIAB100(0x3), packet size: 107
Aug 31 15:18:54.214 GMT:  src: 172.18.16.5, dst: 172.18.16.1
Aug 31 15:18:54.214 GMT:  (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:54.214 GMT:      shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:54.214 GMT:      pktsz: 107 extoff: 52
Aug 31 15:18:54.214 GMT:  (M) flags: "unique nat ", reqid: 30303
Aug 31 15:18:54.214 GMT:      src NBMA: 44.44.44.44
Aug 31 15:18:54.214 GMT:      src protocol: 172.18.16.5, dst protocol: 172.18.16.1
Aug 31 15:18:54.214 GMT:  (C-1) code: no error(0)
Aug 31 15:18:54.214 GMT:        prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:54.214 GMT:        addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
Aug 31 15:18:54.235 GMT: NHRP: Receive Registration Reply via Tunnel1 vrf MIAB100(0x3), packet size: 127
Aug 31 15:18:54.235 GMT:  (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:54.235 GMT:      shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:54.235 GMT:      pktsz: 127 extoff: 52
Aug 31 15:18:54.235 GMT:  (M) flags: "unique nat ", reqid: 30303
Aug 31 15:18:54.235 GMT:      src NBMA: 44.44.44.44
Aug 31 15:18:54.235 GMT:      src protocol: 172.18.16.5, dst protocol: 172.18.16.1
Aug 31 15:18:54.235 GMT:  (C-1) code: no error(0)
Aug 31 15:18:54.235 GMT:        prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:54.235 GMT:        addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255

Here is  debug from "non-responding" Tunnel interface:

Aug 31 15:18:58.034 GMT: NHRP: Send Registration Request via Tunnel0 vrf MIAB100(0x3), packet size: 107
Aug 31 15:18:58.034 GMT:  src: 172.18.128.5, dst: 172.18.128.1
Aug 31 15:18:58.034 GMT:  (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:58.034 GMT:      shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:58.034 GMT:      pktsz: 107 extoff: 52
Aug 31 15:18:58.034 GMT:  (M) flags: "unique nat ", reqid: 30304
Aug 31 15:18:58.034 GMT:      src NBMA: 44.44.44.44
Aug 31 15:18:58.034 GMT:      src protocol: 172.18.128.5, dst protocol: 172.18.128.1
Aug 31 15:18:58.034 GMT:  (C-1) code: no error(0)
Aug 31 15:18:58.034 GMT:        prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:58.034 GMT:        addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
Aug 31 15:18:58.076 GMT: NHRP: Receive Registration Reply via Tunnel1 vrf MIAB100(0x3), packet size: 127
Aug 31 15:18:58.076 GMT:  (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:58.076 GMT:      shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:58.077 GMT:      pktsz: 127 extoff: 52
Aug 31 15:18:58.077 GMT:  (M) flags: "unique nat ", reqid: 30304
Aug 31 15:18:58.077 GMT:      src NBMA: 44.44.44.44
Aug 31 15:18:58.077 GMT:      src protocol: 172.18.128.5, dst protocol: 172.18.128.1
Aug 31 15:18:58.077 GMT:  (C-1) code: no error(0)
Aug 31 15:18:58.077 GMT:        prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:58.077 GMT:        addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
2 Replies 2

pavel.skovajsa
Level 1
Level 1

For those who are interested I have worked with TAC on this, who was very helpful. It turns out the issue above is caused by us somehow ending up using same GRE tunnel keys for a pair of tunnels (Tun0+Tun1 and Tun2+Tun3) which had the same source interface.

This is a known limitation that I completely forgot about. The router is somehow not able to tell which incoming packet belongs to which GRE tunnel (even though the tunnel crypto proxy acls for each tunnel are different) and uses GRE tunnel key for that.

This helped me resolve an issue with dmvpn tunnels in iWAN, where the internet connections were commodity connections from a single provider that didn't provide a Static IP.
The internet provider's routers were offering an internal IP of 10.1.10.10.
My Interface was set to vrf INET & ip address dhcp, with a route of ip route vrf INET *default* dhcp
I had several of these devices configured this way, but their internet tunnel wouldn't stay up.
"show dmvpn" would often show the Static connections as "NHRP" instead of "UP".
I believe what was happening is Site_A connects to the Data Center Router builds it's IPSec Tunnel with the Unique Internet IP, then GRE uses the 10.1.10.10 Internal IP to build out its tunnel. When Site_B connects the router builds it's IPSec Tunnel with it's own unique internet IP, but when it goes to build the GRE tunnel the data center router says I already have a 10.1.10.10 registered and you're not it.
The solution is to use a unique static internal IP on all of the routers.
instead of ip address dhcp use ip address 10.1.10.20 for Site_A and 10.1.10.21 for Site_B.
If you have DSL service that gives you a 192.168.1.x IP make sure they're unique.
Site_A 192.168.1.20
Site_B 192.168.1.21

Thank you for following up with your solution. I'm sure this information has helped quite a few others like myself!
Stas
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card