Multicast traffic in a VXLAN-Bridged VLAN gets duplicated due to ECMP

ss1 · ‎09-02-2021

Hello,

I believe we have an interesting case with our VXLAN Spine and leaf topology and some of our bridged VLANs.

The topology is a 2-layer spine model consisting of 9364C in order to perform downstream routing towards leafs which are 9396px. There are 2 links between each Leaf and each Spine in order to do ECMP.

This kind of setup is performing great with our unicast traffic, however there are also a number of VLANs that have plenty of multicast content inside them. For example I have a customer who is sending 900 megabit of multicast traffic (I think these are some TV streams) in VLAN 1234 from our Leaf 1 to our Leaf 2. This traffic is afterwards bridged to a vn-segment and then carried through it's own multicast group as configured under 'interface nve1'. The multicast group is unique for this VLAN and not shared with other VLANs due to the fact the BUM content is too much.

So we are supposed to transfer 900M through a topology like so:
Leaf 1 --> Spine (A) --> Spine (B) --> Leaf 2

with both links between leafs and spines being 2x100G or 2x40G in order to have ECMP in service.

Spine (B) received the 1G traffic from the multicast group and routed it to Leaf 2 as required however it also routed the same multicast contents back to Spine (A) through the other link in the ECMP group of ports.

I had been thinking for a while and figured out that such an end result might have its explanation, I have both leafs talking and broadcasting between one another, so the BUM contents from Leaf 2 to Leaf 1 has to return back somehow. Even if it's just an arp request I will have it sent back through the same multicast group as huge as 1G due to the fact it's one and same on both ends.

So here it comes the question how could we manage a case like that. Is it a possible and supported scenario to have SSM enabled so that I can route the multicast groups per their source IP address of the nve as well?

Any feedback would be appreciated.

Thank you.

ss1 · ‎09-03-2021

Let me show you some mroute output in order to explain better what's wrong.

IP Multicast Routing Table for VRF "default"

Total number of routes: 24
Total number of (*,G) routes: 7
Total number of (S,G) routes: 16
Total number of (*,G-prefix) routes: 1

(*, 225.134.213.4/32), uptime: 11:08:41, pim(1) ip(0) 
  RPF-Source: 10.128.1.1 [7/110]
  Data Created: No
  Stats: 18/25092 [Packets/Bytes], 0.000   bps
  Stats: Inactive Flow
  Incoming interface: Ethernet1/50, RPF nbr: 10.18.20.1
  Outgoing interface list: (count: 1) (bridge-only: 0)
    Ethernet1/16, uptime: 09:59:44, pim


(10.128.2.12/32, 225.134.213.4/32), uptime: 11:08:41, ip(0) mrib(0) pim(2)   
  RPF-Source: 10.128.2.12 [8/110]
  Data Created: Yes
  Stats: 2697/184552 [Packets/Bytes], 27.200  bps
  Stats: Active Flow
  Incoming interface: Ethernet1/52, RPF nbr: 10.18.20.5
  Outgoing interface list: (count: 2) (bridge-only: 0)
    Ethernet1/16, uptime: 09:59:44, pim
    Ethernet1/50, uptime: 10:00:56, pim

(10.128.162.134/32, 225.134.213.4/32), uptime: 11:08:41, pim(1) mrib(0) ip(0)
  RPF-Source: 10.128.162.134 [5/110]
  Data Created: No
  Stats: 2652/135252 [Packets/Bytes], 27.200  bps
  Stats: Active Flow
  Incoming interface: Ethernet1/16, RPF nbr: 10.20.134.2
  Outgoing interface list: (count: 1) (bridge-only: 0)
    Ethernet1/52, uptime: 00:35:42, pim

What do we have here:
IP address 10.128.2.12 is the nve source of Leaf A (the one which is streaming the contents)
IP address 10.128.162.134 is the nve source of Leaf B (the one which is "receiving" the contents)

The output above is produced on the Spine layer (the PIM router).
Interfaces Ethernet1/50 and Ethernet1/52 are 2x40G ECMP where the 10.128.2.12 source comes from. Interface Ethernet1/16 is where the destination of the stream is (the receiver 10.128.162.134).

10.128.162.134 as a multicast source is sending a bit of BUM in the multicast group 225.134.213.4 which is done for obvious reasons, say, arp broadcasts, IGMP request etc.etc. The Spine switch registers this and provides information it's forwarded back to the "sender" 10.128.2.12 (as expected).

However, the question that left without explanation is why do we get the source content arriving on Ethernet1/52 sent back to the sender 10.128.2.12 through the other member in the ECMP group, more specifically Ethernet1/50.