08-31-2016 08:26 AM - edited 03-05-2019 04:36 AM
We have a fully working DMVPN phase3 where each spoke with 1 internet uplink has 4 different mGRE tunnels to 4 different Hubs (for redundancy purposes). All is working fine, all tunnels are passing traffic just fine, but for some reason always only 2 NHS are responding to NHRP req.
spoke#show ip nhrp nhs detailLegend: E=Expecting replies, R=Responding, W=Waiting Tunnel0: 172.18.128.1 E NBMA Address: 1.1.1.1 priority = 0 cluster = 0 req-sent 8 req-failed 0 repl-recv 0 (3w1d ago) Tunnel1: 172.18.16.1 RE NBMA Address: 2.2.2.2 priority = 0 cluster = 0 req-sent 47680 req-failed 0 repl-recv 47667 (00:00:15 ago) Tunnel2: 172.18.160.1 E NBMA Address: 3.3.3.3 priority = 0 cluster = 0 req-sent 153533 req-failed 0 repl-recv 0 Tunnel3: 172.18.208.1 RE NBMA Address: 4.4.4.4 priority = 0 cluster = 0 req-sent 47604 req-failed 0 repl-recv 47602 (00:00:39 ago) Pending Registration Requests: Registration Request: Reqid 134, Ret 64 NHS 172.18.160.1 expired (Tu2) Registration Request: Reqid 23618, Ret 64 NHS 172.18.128.1 expired (Tu0)
Interesting enough, when we shutdown Tun1, and shut/unshut Tun0, suddenly Tun0 NHS starts responding. At the end there are always max. 2 NHS which are responding. This is definitely not what we would expect in normal situation. Interesting enough, we can reproduce the same behavior in the lab. Do you think this is "normal"?
Debugging the NHRP NHS packets is also very interesting - it shows that the NHS actually responds but the response packet comes to wrong mGRE tunnel!
Here is a debug from a "working" Tunnel interface:
Aug 31 15:18:54.214 GMT: NHRP: Send Registration Request via Tunnel1 vrf MIAB100(0x3), packet size: 107
Aug 31 15:18:54.214 GMT: src: 172.18.16.5, dst: 172.18.16.1
Aug 31 15:18:54.214 GMT: (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:54.214 GMT: shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:54.214 GMT: pktsz: 107 extoff: 52
Aug 31 15:18:54.214 GMT: (M) flags: "unique nat ", reqid: 30303
Aug 31 15:18:54.214 GMT: src NBMA: 44.44.44.44
Aug 31 15:18:54.214 GMT: src protocol: 172.18.16.5, dst protocol: 172.18.16.1
Aug 31 15:18:54.214 GMT: (C-1) code: no error(0)
Aug 31 15:18:54.214 GMT: prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:54.214 GMT: addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
Aug 31 15:18:54.235 GMT: NHRP: Receive Registration Reply via Tunnel1 vrf MIAB100(0x3), packet size: 127
Aug 31 15:18:54.235 GMT: (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:54.235 GMT: shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:54.235 GMT: pktsz: 127 extoff: 52
Aug 31 15:18:54.235 GMT: (M) flags: "unique nat ", reqid: 30303
Aug 31 15:18:54.235 GMT: src NBMA: 44.44.44.44
Aug 31 15:18:54.235 GMT: src protocol: 172.18.16.5, dst protocol: 172.18.16.1
Aug 31 15:18:54.235 GMT: (C-1) code: no error(0)
Aug 31 15:18:54.235 GMT: prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:54.235 GMT: addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
Here is debug from "non-responding" Tunnel interface:
Aug 31 15:18:58.034 GMT: NHRP: Send Registration Request via Tunnel0 vrf MIAB100(0x3), packet size: 107
Aug 31 15:18:58.034 GMT: src: 172.18.128.5, dst: 172.18.128.1
Aug 31 15:18:58.034 GMT: (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:58.034 GMT: shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:58.034 GMT: pktsz: 107 extoff: 52
Aug 31 15:18:58.034 GMT: (M) flags: "unique nat ", reqid: 30304
Aug 31 15:18:58.034 GMT: src NBMA: 44.44.44.44
Aug 31 15:18:58.034 GMT: src protocol: 172.18.128.5, dst protocol: 172.18.128.1
Aug 31 15:18:58.034 GMT: (C-1) code: no error(0)
Aug 31 15:18:58.034 GMT: prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:58.034 GMT: addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
Aug 31 15:18:58.076 GMT: NHRP: Receive Registration Reply via Tunnel1 vrf MIAB100(0x3), packet size: 127
Aug 31 15:18:58.076 GMT: (F) afn: AF_IP(1), type: IP(800), hop: 255, ver: 1
Aug 31 15:18:58.076 GMT: shtl: 4(NSAP), sstl: 0(NSAP)
Aug 31 15:18:58.077 GMT: pktsz: 127 extoff: 52
Aug 31 15:18:58.077 GMT: (M) flags: "unique nat ", reqid: 30304
Aug 31 15:18:58.077 GMT: src NBMA: 44.44.44.44
Aug 31 15:18:58.077 GMT: src protocol: 172.18.128.5, dst protocol: 172.18.128.1
Aug 31 15:18:58.077 GMT: (C-1) code: no error(0)
Aug 31 15:18:58.077 GMT: prefix: 32, mtu: 9972, hd_time: 120
Aug 31 15:18:58.077 GMT: addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 255
09-19-2016 11:04 AM
For those who are interested I have worked with TAC on this, who was very helpful. It turns out the issue above is caused by us somehow ending up using same GRE tunnel keys for a pair of tunnels (Tun0+Tun1 and Tun2+Tun3) which had the same source interface.
This is a known limitation that I completely forgot about. The router is somehow not able to tell which incoming packet belongs to which GRE tunnel (even though the tunnel crypto proxy acls for each tunnel are different) and uses GRE tunnel key for that.
09-06-2017 09:23 AM
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: