cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1431
Views
3
Helpful
10
Replies

OTV intermittent when passing through ACI

freemen810
Level 1
Level 1

Recently, we performed a migration where we decommissioned our old N7K (network switch) and moved all devices to ACI (Application Centric Infrastructure).

 

The task involved moving the existing OTV (Overlay Transport Virtualization) router from its current connection in the N7K to the new ACI infrastructure. The original setup had the OTV router connected as follows:

 

- OTV (Outside peer link L3) connected to N7K EXT VDC (Virtual Device Context).

- OTV (Inside L2 Link) connected to N7K-Server Farm VDC.

 

In the new infrastructure, the OTV router was connected as follows:

 

- OTV (Outside peer link L3) connected to a WAN SWITCH.

- OTV (Inside L2 Link) connected to an L2Out ACI Trunk.

 

After the migration, an issue arose with communication to the gateway (10.0.130.1/24 ACI Gateway) located on VLAN 50 at Site A (BGI). The network path for pinging the gateway from Site HTV was as follows:

 

Site HTV --> OTV L2 --> OTV --> L3 --> WAN Switch --> DWDM (Dense Wavelength Division Multiplexing) --> OTV HTV L3 --> WAN Switch --> OTV L2 --> ACI.

 

The problem is that pinging the gateway (10.0.130.1) from Site HTV was inconsistent. While it was possible to ping other servers within the same subnet, pinging the gateway on the BGI site ACI was erratic.

 

I'm reaching out to the community for some help. If anyone has faced similar sanario

1 Accepted Solution

Accepted Solutions

freemen810
Level 1
Level 1

 

PROBLEM DESCRIPTION
----------------------------------------

Intermittent connectivity between server 10.0.130.20 in SITE_B site, and its default gateway 10.0.130.1 in BGI site.

Server 10.0.130.20 is connected to ACI SITE_B fabric, and default gateway is connected to BGI ACI fabric. Both fabrics are completely independent of each other and are connected via ASR1k OTV.

The ping from server 10.0.130.20 to gateway 10.0.130.1 works for some time, say 10-20 consecutive successful pings, and then drops for 20-30 pings. Exact success/drop counters change over time, but that's the idea -- there are periods with 100% success ping rate, and periods with 100% failure ping rate.

At the same time, ping from server 10.0.130.20 in SITE_B site to server 10.0.130.40 in BGI site works fine all the time. This flow is using same path through the network as the problem flow between server 10.0.130.20 and default gateway 10.0.130.1.


TROUBLESHOOTING STEPS
----------------------------------------

We took built-in ACI SPAN capture on SITE_B site and see that server 10.0.130.20 keeps sending ICMP requests even when it doesn't receive reply from default gateway. So, it doesn't seem to be ARP resolution issue in SITE_B site.

On BGI Leaf 301 in BGI site we captured packets with tcpdump and see ICMP Requests from 10.0.130.20 received and replied to immediately. However, on same capture, when server 10.0.130.20 starts seeing drops, packet capture 'stops' -- we don't even receive ICMP Requests from 10.0.130.20 during failure periods.

ASR1k team checked the traffic and see the packets getting dropped on ASR1k in SITE_B site when destination MAC 00:22:bd:f8:19:ff, which is the MAC of default gateway 10.0.130.1 and is supposed to be learned from BGI site, is getting learned from SITE_B site.

SITE_B_ASR_OTV1#show platform packet-trace summary
Pkt Input Output State Reason
0 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
1 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
2 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
3 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
4 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
5 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)

Here is the output from SITE_B ASR1k that shows destination MAC of default gateway 10.0.130.1 being incorrectly learned from SITE_B site


SITE_B_ASR_OTV1#sh otv route vlan 50

Codes: BD - Bridge-Domain, AD - Admin-Distance,
SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 50 50 0022.bdf8.19ff 40 BD Eng Te0/0/1:SI50 << OTV SITE B INTERFACE to ACI
0 50 50 00fc.ba63.d391 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 3429.8f73.256b 40 BD Eng Te0/0/1:SI50
0 50 50 4006.d5aa.475a 40 BD Eng Te0/0/1:SI50
0 50 50 482e.723a.01b6 40 BD Eng Te0/0/1:SI50
0 50 50 482e.724b.9668 40 BD Eng Te0/0/1:SI50
0 50 50 98be.940c.dc50 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 b026.28e5.6cdc 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 cc16.7ec1.f73c 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 e4c7.2200.3245 40 BD Eng Te0/0/1:SI50

10 unicast routes displayed in Overlay1

----------------------------------------------------------
10 Total Unicast Routes Displayed

SITE_B_ASR_OTV1#

Once the MAC table on ASR1k has 00:22:bd:f8:19:ff pointing to SITE-A site, connectivity restores.

SITE_B_ASR_OTV1#sh otv route vlan 50

Codes: BD - Bridge-Domain, AD - Admin-Distance,
SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 50 50 0022.bdf8.19ff 50 ISIS SITE_A_ASR_OTV1 << OTV SITE A L3 Interface
0 50 50 00fc.ba63.d391 50 ISIS SITE_A_ASR_OTV1
0 50 50 3429.8f73.256b 40 BD Eng Te0/0/1:SI50
0 50 50 4006.d5aa.475a 40 BD Eng Te0/0/1:SI50
0 50 50 482e.723a.01b6 40 BD Eng Te0/0/1:SI50
0 50 50 482e.724b.9668 40 BD Eng Te0/0/1:SI50
0 50 50 98be.940c.dc50 50 ISIS SITE_A_ASR_OTV1
0 50 50 b026.28e5.6cdc 50 ISIS SITE_A_ASR_OTV1
0 50 50 cc16.7ec1.f73c 50 ISIS SITE_A_ASR_OTV1
0 50 50 e4c7.2200.3245 40 BD Eng Te0/0/1:SI50

10 unicast routes displayed in Overlay1

----------------------------------------------------------
10 Total Unicast Routes Displayed

SITE_B_ASR_OTV1#

We performed SPAN packet capture on SITE_B ACI LEAF 401 interface towards ASR1k, and see IGMPv3 membership reports being sent towards ASR1k from SITE_B site, using source MAC 00:22:bd:f8:19:ff

ACI SPAN session configuration is shown below

APIC-DR-01# fabric 401 show monitor session 7
----------------------------------------------------------------
Node 401 (SITE_B_BLF_0401)
----------------------------------------------------------------
session 7
---------------
name : span-source-10.0.130.1
description : Span session 7
type : local
state : up (active)
mode : access
Filter Group : None
source intf :
rx :
tx : [Eth1/34]
both :
source VLANs :
rx :
tx :
both :
filter VLANs : filter not specified
filter L3Outs : filter not specified
destination ports : Eth1/3

APIC-DR-01#


The result of local SPAN capture is as follows

freemen810_1-1690456637642.png

 



Checking IGMP snooping operation on SITE_B LEAF 401, we see IGMP membership reports received from port-channel 2: below outputs suggest that internal vlan 86 (which represents Bridge Domain SITE_B-2-PROD-DC:VLAN0050_AS400_PROD_BD) has IGMP snooping enabled

SITE_B_BLF_0401# show ip igmp snooping groups
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan Group Address Ver Type Port list
86 */* - R Eth1/34
86 239.255.102.18 v3 D Po2
86 239.255.255.250 v3 D Po2
SITE_B_BLF_0401#



SITE_B_BLF_0401# show vlan id 86 extended

VLAN Name Encap Ports
---- -------------------------------- ---------------- ------------------------
86 SITE_B2-PROD- vxlan-15433636 Eth1/23, Eth1/34,
DC:VLAN0050_AS400_PROD_BD Eth1/35, Eth1/36, Po1,
Po2, Po3
SITE_B_BLF_0401#




APIC-DR-01# fabric 401 show ip igmp snooping
----------------------------------------------------------------
Node 401 (SITE_B_BLF_0401)
----------------------------------------------------------------
Global IGMP Snooping Information:
IGMP Snooping enabled
Optimised Multicast Flood (OMF) enabled
IGMPv1/v2 Report Suppression enabled
IGMPv3 Report Suppression disabled
Link Local Groups Suppression enabled

IGMP Snooping information for vlan 86
IGMP snooping enabled
Lookup mode: IP
Optimised Multicast Flood (OMF) disabled
IGMP querier present, address: 10.188.210.66, version: 3, i/f Eth1/34
Querier interval: 60 secs
Querier last member query interval: 1 secs
Querier robustness: 2
Switch-querier disabled
IGMPv3 Explicit tracking enabled
IGMPv2 Fast leave disabled
IGMPv1/v2 Report suppression enabled
IGMPv3 Report suppression enabled
Link Local Groups suppression enabled
Router port detection using PIM Hellos, IGMP Queries
Number of router-ports: 1
Number of groups: 2
VLAN vPC function enabled
Multicast Routing disabled on VLAN
Active ports:
Eth1/23 Eth1/34 Eth1/35 Eth1/36
Po1 Po2 Po3



We configured new IGMP snooping policy under tenant, which disables IGMP snooping, and applied it to Bridge Domain SITE_B-2-PROD-DC:VLAN0050_AS400_PROD_BD. After that IGMP snooping shows as disabled under the vlan 86


APIC-DR-01# fabric 401 show ip igmp snooping
----------------------------------------------------------------
Node 401 (SITE_B_BLF_0401)
----------------------------------------------------------------
Global IGMP Snooping Information:
IGMP Snooping enabled
Optimised Multicast Flood (OMF) enabled
IGMPv1/v2 Report Suppression enabled
IGMPv3 Report Suppression disabled
Link Local Groups Suppression enabled


IGMP Snooping information for vlan 86
IGMP snooping disabled
Lookup mode: IP
Optimised Multicast Flood (OMF) disabled
IGMP querier none
Switch-querier disabled
IGMPv3 Explicit tracking disabled
IGMPv2 Fast leave disabled
IGMPv1/v2 Report suppression disabled
IGMPv3 Report suppression disabled
Link Local Groups suppression disabled
Router port detection using PIM Hellos, IGMP Queries
Number of router-ports: 0
Number of groups: 0
Multicast Routing disabled on VLAN
Active ports:
Eth1/23 Eth1/34 Eth1/35 Eth1/36
Po1 Po2 Po3


Once the old entry for MAC address 00:22:bd:f8:19:ff expires from SITE_B ASR1k OTV route table, and correct entry is learned from BGI site, connectivity restores again, and stays up

SITE_B_ASR_OTV1#sh otv route vlan 50

Codes: BD - Bridge-Domain, AD - Admin-Distance,
SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 50 50 0022.bdf8.19ff 50 ISIS SITE_A_ASR_OTV1 << Correctly learned Mac address
0 50 50 00fc.ba63.d391 50 ISIS SITE_A_ASR_OTV1
0 50 50 3429.8f73.256b 40 BD Eng Te0/0/1:SI50
0 50 50 4006.d5aa.475a 40 BD Eng Te0/0/1:SI50
0 50 50 482e.723a.01b6 40 BD Eng Te0/0/1:SI50
0 50 50 482e.724b.9668 40 BD Eng Te0/0/1:SI50
0 50 50 98be.940c.dc50 50 ISIS SITE_A_ASR_OTV1
0 50 50 b026.28e5.6cdc 50 ISIS SITE_A_ASR_OTV1
0 50 50 cc16.7ec1.f73c 50 ISIS SITE_A_ASR_OTV1
0 50 50 e4c7.2200.3245 40 BD Eng Te0/0/1:SI50

10 unicast routes displayed in Overlay1

----------------------------------------------------------
10 Total Unicast Routes Displayed

SITE_B_ASR_OTV1#

 

 

 

in summery

the root cause was due to both Site A ACI fabric and Site B ACI fabric using the same mac address 0022.bdf8.19ff.

ACI by design is made to use the same mac address accross the fabric but when an external extention of vlan is introduced the mac address is seen on the other end via IGMP and not via broadcast. that is why in the senario when we added in the c9300 the issue went away. it is because the 9300 also has its own igmp enabled by defauly there for becoming a proxy and sending its own mac address to the otv router rather then the ACI one.

 

i hope this helps anyone who faces such issue in the future.

 

special thanks to CISCO TAC Nikolay Kartashev

View solution in original post

10 Replies 10

Marcel Zehnder
Spotlight
Spotlight

Hi, so your gateway is configured as a BD-IP on ACI and you stretching this BD with a L2out over OTV to another OTV site? Can you post the settings of that BD and also the port configuration of the L2out (vPC, PC or single port)? Second thing: Is there a specific reason why you use a L2out, would it be also possible to stretch this an EPG static port? 

freemen810
Level 1
Level 1

Hi,

Below is the diagram. The Gateway for site B is sitting in site A BD 50 in the ACI. The intermittent ping only happens when the host in Site B Vlan 50 ping to the BD gateway in Site A. When pinging to host in Site A Vlan 50 there is no ping loss.

I noticed something peculiar, when adding in a L2 HOP (Test Scenario) the intermittency issue is resolved. I cant seem to pinpoint where the issue is, as far as i can see the ACI is forwarding the packet to the correct path which is to the ASR router for the OTV

 

freemen810_1-1690267318075.png

 

freemen810_0-1690267169850.png

 

freemen810_2-1690267351117.png

 

Marcel Zehnder
Spotlight
Spotlight

Site-A and B are two individiual ACI Fabrics? How does the BD Settings for "VLAN" 50 look in these two fabrics and can you confirm unicast routing on BD-50 is only enabled in Site-A?

Yes they are independent fabrics. the BD VLAN 50 in Site A has unicast routing enabled, and BD in Site B has unicast routing disabled. furthermore, BD in Site B does not hold any IP.

since the VLAN 50 is extended over OTV to Site A. The Gateway for Site B sits in BD in SITE A. 

in the details i mentioned above the issue only happens when we try to ping the SITE A BD vlan 50 gateway 10.0.130.1 and there for we cannot reach anything outside the subnet. furthermore, when adding an additional L2 hop in-between magickly solves the intermittency issue. 

this is where we are stuck. why is it behaving in such a manner?

Marcel Zehnder
Spotlight
Spotlight

That's strange, from an ACI point of view I don't see why this intermediate L2 hop is changing the behaviour.

Is site B's BD running in flood or proxy mode? And what's the difference regarding the L1/L2 connection between OTV router direct connected or via L2 hop (in the layout I see OTV router is single connected to one leaf, what about the intermediate switch, is it connected also with just one link, or do you use a vPC)?

Site B is set to flooding.

There is no inherit difference between Level 1 connection and level 2 apart from the N7K.

In level 1 the OTV had it's layer 3 link connected to the N7K via the ext vdc and the layer 2 link via the serverfarm vdc.

 

It's crucial to note that the N7k is a single chasis running 2 vdc.

 

At site B we have one OTV router while site A have 2 and all are in an otv adjacency peer to each other.

So if site A OTV R1 fails it can use R2 to go over to site A. 

 

And yes I've tried failing over the OTV router as well an the issue still persist.

Marcel Zehnder
Spotlight
Spotlight

I meant whether there is a difference between the two "red" links in the layout:
Screenshot 2023-07-25 105403.png

Ohh.. nop they are the same trunk port with mtu 9000

Maybe @dpita@RedNectar or @Sergiu.Daniluk  has an idea... 

freemen810
Level 1
Level 1

 

PROBLEM DESCRIPTION
----------------------------------------

Intermittent connectivity between server 10.0.130.20 in SITE_B site, and its default gateway 10.0.130.1 in BGI site.

Server 10.0.130.20 is connected to ACI SITE_B fabric, and default gateway is connected to BGI ACI fabric. Both fabrics are completely independent of each other and are connected via ASR1k OTV.

The ping from server 10.0.130.20 to gateway 10.0.130.1 works for some time, say 10-20 consecutive successful pings, and then drops for 20-30 pings. Exact success/drop counters change over time, but that's the idea -- there are periods with 100% success ping rate, and periods with 100% failure ping rate.

At the same time, ping from server 10.0.130.20 in SITE_B site to server 10.0.130.40 in BGI site works fine all the time. This flow is using same path through the network as the problem flow between server 10.0.130.20 and default gateway 10.0.130.1.


TROUBLESHOOTING STEPS
----------------------------------------

We took built-in ACI SPAN capture on SITE_B site and see that server 10.0.130.20 keeps sending ICMP requests even when it doesn't receive reply from default gateway. So, it doesn't seem to be ARP resolution issue in SITE_B site.

On BGI Leaf 301 in BGI site we captured packets with tcpdump and see ICMP Requests from 10.0.130.20 received and replied to immediately. However, on same capture, when server 10.0.130.20 starts seeing drops, packet capture 'stops' -- we don't even receive ICMP Requests from 10.0.130.20 during failure periods.

ASR1k team checked the traffic and see the packets getting dropped on ASR1k in SITE_B site when destination MAC 00:22:bd:f8:19:ff, which is the MAC of default gateway 10.0.130.1 and is supposed to be learned from BGI site, is getting learned from SITE_B site.

SITE_B_ASR_OTV1#show platform packet-trace summary
Pkt Input Output State Reason
0 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
1 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
2 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
3 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
4 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)
5 Te0/0/1.EFP50 Te0/0/1.EFP50 DROP 263 (L2BDSourceFilter)

Here is the output from SITE_B ASR1k that shows destination MAC of default gateway 10.0.130.1 being incorrectly learned from SITE_B site


SITE_B_ASR_OTV1#sh otv route vlan 50

Codes: BD - Bridge-Domain, AD - Admin-Distance,
SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 50 50 0022.bdf8.19ff 40 BD Eng Te0/0/1:SI50 << OTV SITE B INTERFACE to ACI
0 50 50 00fc.ba63.d391 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 3429.8f73.256b 40 BD Eng Te0/0/1:SI50
0 50 50 4006.d5aa.475a 40 BD Eng Te0/0/1:SI50
0 50 50 482e.723a.01b6 40 BD Eng Te0/0/1:SI50
0 50 50 482e.724b.9668 40 BD Eng Te0/0/1:SI50
0 50 50 98be.940c.dc50 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 b026.28e5.6cdc 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 cc16.7ec1.f73c 50 ISIS MYSEL_BGI_L3_ASR_OTV1
0 50 50 e4c7.2200.3245 40 BD Eng Te0/0/1:SI50

10 unicast routes displayed in Overlay1

----------------------------------------------------------
10 Total Unicast Routes Displayed

SITE_B_ASR_OTV1#

Once the MAC table on ASR1k has 00:22:bd:f8:19:ff pointing to SITE-A site, connectivity restores.

SITE_B_ASR_OTV1#sh otv route vlan 50

Codes: BD - Bridge-Domain, AD - Admin-Distance,
SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 50 50 0022.bdf8.19ff 50 ISIS SITE_A_ASR_OTV1 << OTV SITE A L3 Interface
0 50 50 00fc.ba63.d391 50 ISIS SITE_A_ASR_OTV1
0 50 50 3429.8f73.256b 40 BD Eng Te0/0/1:SI50
0 50 50 4006.d5aa.475a 40 BD Eng Te0/0/1:SI50
0 50 50 482e.723a.01b6 40 BD Eng Te0/0/1:SI50
0 50 50 482e.724b.9668 40 BD Eng Te0/0/1:SI50
0 50 50 98be.940c.dc50 50 ISIS SITE_A_ASR_OTV1
0 50 50 b026.28e5.6cdc 50 ISIS SITE_A_ASR_OTV1
0 50 50 cc16.7ec1.f73c 50 ISIS SITE_A_ASR_OTV1
0 50 50 e4c7.2200.3245 40 BD Eng Te0/0/1:SI50

10 unicast routes displayed in Overlay1

----------------------------------------------------------
10 Total Unicast Routes Displayed

SITE_B_ASR_OTV1#

We performed SPAN packet capture on SITE_B ACI LEAF 401 interface towards ASR1k, and see IGMPv3 membership reports being sent towards ASR1k from SITE_B site, using source MAC 00:22:bd:f8:19:ff

ACI SPAN session configuration is shown below

APIC-DR-01# fabric 401 show monitor session 7
----------------------------------------------------------------
Node 401 (SITE_B_BLF_0401)
----------------------------------------------------------------
session 7
---------------
name : span-source-10.0.130.1
description : Span session 7
type : local
state : up (active)
mode : access
Filter Group : None
source intf :
rx :
tx : [Eth1/34]
both :
source VLANs :
rx :
tx :
both :
filter VLANs : filter not specified
filter L3Outs : filter not specified
destination ports : Eth1/3

APIC-DR-01#


The result of local SPAN capture is as follows

freemen810_1-1690456637642.png

 



Checking IGMP snooping operation on SITE_B LEAF 401, we see IGMP membership reports received from port-channel 2: below outputs suggest that internal vlan 86 (which represents Bridge Domain SITE_B-2-PROD-DC:VLAN0050_AS400_PROD_BD) has IGMP snooping enabled

SITE_B_BLF_0401# show ip igmp snooping groups
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan Group Address Ver Type Port list
86 */* - R Eth1/34
86 239.255.102.18 v3 D Po2
86 239.255.255.250 v3 D Po2
SITE_B_BLF_0401#



SITE_B_BLF_0401# show vlan id 86 extended

VLAN Name Encap Ports
---- -------------------------------- ---------------- ------------------------
86 SITE_B2-PROD- vxlan-15433636 Eth1/23, Eth1/34,
DC:VLAN0050_AS400_PROD_BD Eth1/35, Eth1/36, Po1,
Po2, Po3
SITE_B_BLF_0401#




APIC-DR-01# fabric 401 show ip igmp snooping
----------------------------------------------------------------
Node 401 (SITE_B_BLF_0401)
----------------------------------------------------------------
Global IGMP Snooping Information:
IGMP Snooping enabled
Optimised Multicast Flood (OMF) enabled
IGMPv1/v2 Report Suppression enabled
IGMPv3 Report Suppression disabled
Link Local Groups Suppression enabled

IGMP Snooping information for vlan 86
IGMP snooping enabled
Lookup mode: IP
Optimised Multicast Flood (OMF) disabled
IGMP querier present, address: 10.188.210.66, version: 3, i/f Eth1/34
Querier interval: 60 secs
Querier last member query interval: 1 secs
Querier robustness: 2
Switch-querier disabled
IGMPv3 Explicit tracking enabled
IGMPv2 Fast leave disabled
IGMPv1/v2 Report suppression enabled
IGMPv3 Report suppression enabled
Link Local Groups suppression enabled
Router port detection using PIM Hellos, IGMP Queries
Number of router-ports: 1
Number of groups: 2
VLAN vPC function enabled
Multicast Routing disabled on VLAN
Active ports:
Eth1/23 Eth1/34 Eth1/35 Eth1/36
Po1 Po2 Po3



We configured new IGMP snooping policy under tenant, which disables IGMP snooping, and applied it to Bridge Domain SITE_B-2-PROD-DC:VLAN0050_AS400_PROD_BD. After that IGMP snooping shows as disabled under the vlan 86


APIC-DR-01# fabric 401 show ip igmp snooping
----------------------------------------------------------------
Node 401 (SITE_B_BLF_0401)
----------------------------------------------------------------
Global IGMP Snooping Information:
IGMP Snooping enabled
Optimised Multicast Flood (OMF) enabled
IGMPv1/v2 Report Suppression enabled
IGMPv3 Report Suppression disabled
Link Local Groups Suppression enabled


IGMP Snooping information for vlan 86
IGMP snooping disabled
Lookup mode: IP
Optimised Multicast Flood (OMF) disabled
IGMP querier none
Switch-querier disabled
IGMPv3 Explicit tracking disabled
IGMPv2 Fast leave disabled
IGMPv1/v2 Report suppression disabled
IGMPv3 Report suppression disabled
Link Local Groups suppression disabled
Router port detection using PIM Hellos, IGMP Queries
Number of router-ports: 0
Number of groups: 0
Multicast Routing disabled on VLAN
Active ports:
Eth1/23 Eth1/34 Eth1/35 Eth1/36
Po1 Po2 Po3


Once the old entry for MAC address 00:22:bd:f8:19:ff expires from SITE_B ASR1k OTV route table, and correct entry is learned from BGI site, connectivity restores again, and stays up

SITE_B_ASR_OTV1#sh otv route vlan 50

Codes: BD - Bridge-Domain, AD - Admin-Distance,
SI - Service Instance, * - Backup Route

OTV Unicast MAC Routing Table for Overlay1

Inst VLAN BD MAC Address AD Owner Next Hops(s)
----------------------------------------------------------
0 50 50 0022.bdf8.19ff 50 ISIS SITE_A_ASR_OTV1 << Correctly learned Mac address
0 50 50 00fc.ba63.d391 50 ISIS SITE_A_ASR_OTV1
0 50 50 3429.8f73.256b 40 BD Eng Te0/0/1:SI50
0 50 50 4006.d5aa.475a 40 BD Eng Te0/0/1:SI50
0 50 50 482e.723a.01b6 40 BD Eng Te0/0/1:SI50
0 50 50 482e.724b.9668 40 BD Eng Te0/0/1:SI50
0 50 50 98be.940c.dc50 50 ISIS SITE_A_ASR_OTV1
0 50 50 b026.28e5.6cdc 50 ISIS SITE_A_ASR_OTV1
0 50 50 cc16.7ec1.f73c 50 ISIS SITE_A_ASR_OTV1
0 50 50 e4c7.2200.3245 40 BD Eng Te0/0/1:SI50

10 unicast routes displayed in Overlay1

----------------------------------------------------------
10 Total Unicast Routes Displayed

SITE_B_ASR_OTV1#

 

 

 

in summery

the root cause was due to both Site A ACI fabric and Site B ACI fabric using the same mac address 0022.bdf8.19ff.

ACI by design is made to use the same mac address accross the fabric but when an external extention of vlan is introduced the mac address is seen on the other end via IGMP and not via broadcast. that is why in the senario when we added in the c9300 the issue went away. it is because the 9300 also has its own igmp enabled by defauly there for becoming a proxy and sending its own mac address to the otv router rather then the ACI one.

 

i hope this helps anyone who faces such issue in the future.

 

special thanks to CISCO TAC Nikolay Kartashev