05-07-2022 03:34 AM
Dear friends,
Good day to everybody. We came across an issue with our topology regarding BUM blackholing. As a result of troubleshooting I think we have narrowed it down to a probable mroute issue across our N3K-C3164Q running as leafs.
Let's see the topology please:
The 9364 are working as anycast-RPs and underlay routers for the ECMP paths towards the three different VPC domains shown below - the first domain consists of 2 N3K-C3164Q and the other two domains are N9K-C9396PX. The links between the 9364C and leafs are Layer3 multipath ports - every port has an OSPF and PIM on in order to secure the redundancy of the underlay fabric. The hosts shown below is a switchport port-channel towards an end device.
We have detected some unidirectional BUM issues while some hosts on the 3164. They can't get arp replies from each other, hence the traffic drops upon ARP expiration and restores when an ARP request takes place.
We did some diagnosis and think that this issue is most probably narrowed down to an mroute issue on the 3164. I'm not sure if both devices in a VPC domain have to register all multicast sources in their mroute tables (i.e. populating more or less the same routing table but our 3164 aren't doing so). The mroute tables are 1:1 on each of the switches on the 9396 domains but the 3164 do not register all multicast sources.
Let me show a real example with an underlay group in a 3164.
IP: 10.128.122.12 is the secondary IP address on the designated loopback for nve and the 10.128.3.215 is the same thing on the 9396 side. The 3164-1 does not register 10.128.3.215 as a multicast source.
3164-1# show ip mroute 225.161.215.1 IP Multicast Routing Table for VRF "default" (*, 225.161.215.1/32), uptime: 00:01:06, nve pim ip Incoming interface: Ethernet1/27, RPF nbr: 10.183.161.13 Outgoing interface list: (count: 1) nve1, uptime: 00:01:06, nve (10.128.122.12/32, 225.161.215.1/32), uptime: 00:01:06, nve mrib pim ip Incoming interface: loopback3, RPF nbr: 10.128.122.12 Outgoing interface list: (count: 1) Ethernet1/45, uptime: 00:00:39, pim 3164-1# 3164-2# show ip mroute 225.161.215.1 IP Multicast Routing Table for VRF "default" (*, 225.161.215.1/32), uptime: 01:42:14, nve pim ip Incoming interface: Ethernet1/59, RPF nbr: 10.183.162.17 Outgoing interface list: (count: 1) nve1, uptime: 01:42:14, nve (10.128.3.215/32, 225.161.215.1/32), uptime: 01:42:12, ip pim mrib Incoming interface: Ethernet1/49, RPF nbr: 10.183.162.21 Outgoing interface list: (count: 1) nve1, uptime: 01:42:12, mrib (10.128.122.12/32, 225.161.215.1/32), uptime: 01:42:14, nve mrib pim ip Incoming interface: loopback3, RPF nbr: 10.128.122.12 Outgoing interface list: (count: 1) Ethernet1/55, uptime: 01:41:44, pim 3164-2#
I think this issue occurred past our upgrade from NX-OS 7 to NX-OS 9 but I can't be sure due to the fact I didn't expect this issue to occur prior to my upgrade, hence no mroute output had been saved.
The 9396 domains don't have this issue though - both the local and remote sources are registered in the mroute table (sorry I have to display the output with another multicast group however the situation is the same with all others.
9396-1# show ip mroute 225.213.215.3 IP Multicast Routing Table for VRF "default" (*, 225.213.215.3/32), uptime: 3w1d, nve ip pim Incoming interface: Ethernet2/10, RPF nbr: 10.184.213.1 Outgoing interface list: (count: 1) nve1, uptime: 3w1d, nve (10.128.2.12/32, 225.213.215.3/32), uptime: 3w1d, nve mrib ip pim Incoming interface: loopback3, RPF nbr: 10.128.2.12 Outgoing interface list: (count: 0) (10.128.3.215/32, 225.213.215.3/32), uptime: 3w1d, ip pim mrib Incoming interface: Ethernet2/10, RPF nbr: 10.184.213.1 Outgoing interface list: (count: 1) nve1, uptime: 3w1d, mrib 9396-2# show ip mroute 225.213.215.3 IP Multicast Routing Table for VRF "default" (*, 225.213.215.3/32), uptime: 3w1d, nve ip pim Incoming interface: Ethernet2/10, RPF nbr: 10.184.212.1 Outgoing interface list: (count: 1) nve1, uptime: 3w1d, nve (10.128.2.12/32, 225.213.215.3/32), uptime: 3w1d, nve mrib ip pim Incoming interface: loopback3, RPF nbr: 10.128.2.12 Outgoing interface list: (count: 1) Ethernet2/10, uptime: 01:55:33, pim (10.128.3.215/32, 225.213.215.3/32), uptime: 3w1d, ip pim mrib Incoming interface: Ethernet2/10, RPF nbr: 10.184.212.1 Outgoing interface list: (count: 1) nve1, uptime: 3w1d, mrib
To sum it up we get the following situation on the 3164:
3164-1# show ip multicast vrf default Multicast Routing VRFs (2 VRFs) VRF Name VRF Table Route Group Source (*,G) State ID ID Count Count Count Count default 1 0x00000001 240 114 126 113 Up Multipath configuration (1): s-g-hash Resilient configuration: Disabled 3164-2# show ip multicast vrf default Multicast Routing VRFs (2 VRFs) VRF Name VRF Table Route Group Source (*,G) State ID ID Count Count Count Count default 1 0x00000001 363 114 249 113 Up Multipath configuration (1): s-g-hash Resilient configuration: Disabled
The same thing looks considerably better on the 9396:
9396-1# show ip multicast vrf default Multicast Routing VRFs (3 VRFs) VRF Name VRF Table Route Group Source (*,G) State ID ID Count Count Count Count default 1 0x00000001 326 109 216 109 Up Multipath configuration (1): s-g-hash Resilient configuration: Disabled 9396-2# show ip multicast vrf default Multicast Routing VRFs (2 VRFs) VRF Name VRF Table Route Group Source (*,G) State ID ID Count Count Count Count default 1 0x00000001 325 109 215 109 Up Multipath configuration (1): s-g-hash Resilient configuration: Disabled
I'm not sure if the difference in source count can bring any BUM blackholing but any feedback will be appreciated on how to diagnose this further.
Thank you!
05-10-2022 01:31 AM
Hello,
I have just read about the following command: 'ip pim pre-build spt'
Perhaps it would be a good option to try? What do you think?
Thank you!
05-19-2022 05:26 AM
ip pim pre-build spt didn't help.
I can also try with ip multicast multipath s-g-hash next-hop-based as I really think that this is some RPF issue, does anybody think it could be a good tryout as well?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide