Solved: Re: Bridge Domain Settings for Migration

Nik Noltenius · ‎05-18-2018

Hi folks,

I learned from several sources (e.g. BRKACI-2508) that for brownfield scenarios BD settings need to be changed to allow for endpoints outside the fabric to communicate with those inside on a l2-domain: Enable Flooding for Unknown Unicast and ARP. We have done exactly that.

Now for testing purposes we connected four servers to the fabric, a pair of two in two different Pods. Turned out every server can only reach the other server in the same Pod but not the remote ones. From the Leaf-Switches however, all servers are reachable. In addition there is a firewall in the legacy network, providing the default gateway for the fabric also only reachable from the servers in the Pod, connected to the legacy world.

Since BD settings were "known to be correct" we did a lot of troubleshooting in other directions but in the end (what gives) we did change the BD settings back to default (optimized unknown unicast and no ARP flooding). As if a miracle ocurred, everything works now.

The problem is: Why did the "correct" settings not work and will we get issues when we add end points to the VLAN outside the fabric?

Why can we reach the outside gateway now, when the settings are "wrong"?

Shouldn't the presumably non-optimal settings always work anyway, even if they might not get the best performance from the fabric?

Any hints are much appreciated :)

Thanks and regards

Nik

P.S. ACI 3.1(1i), Multi-Pod with two Pods

richmond · ‎05-18-2018

It sounds like the multicast in the Inter-Pod Network (IPN) may not be functioning correctly.

When you set unknown unicast to flood and ARP to flood, these frames are encapsulated in VXLAN with the Bridge Domain GIPO multicast address as destination.

If multicast in the IPN is broken then these packets will be lost between pods.

When you switch to Unknown Unicast Proxy then they are no longer multicast between pods. Provided you have endpoints learnt in COOP then the spine will unicast forward packets to the leaf VTEP where it needs to go. This would explain why communication works when you turn off flood. What it doesn't explain is why the servers in the remote pod can connect to the outside world with ARP flooding disabled, unless you have an IP configured on the Bridge Domain.

With proxy mode there are scenarios where packet drops can occur. If the bridge domain is layer 2 and the gateway is outside of the fabric, when a host outside the fabric has not sent a frame that has been flooded in the legacy network to the border leaves, and an on-fabric host wants to talk to it the spine will drop these packets due to the entry being missing in COOP. If you have ARP flooding enabled then you will likely learn the MAC address at layer 2 due to the ARP packets being flooded and the host outside the fabric responding. However if you have a system that does not use ARP then you will have communication issues unless the unknown unicast forwarding is also set to flood.

ARP will break if the bridge domain has no subnet configured unless you have the flood option for ARP turned on. Without an IP on the bridge domain the spine cannot send ARP glean messages (ARP on behalf of the host) for unknown endpoints.

When the gateway moves to the fabric you no longer require these settings except in corner cases.

See here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/ACI_Best_Practices/b_ACI_Best_Practices/b_ACI_Best_Practices_chapter_010.html

I would check your multicast in the IPN. Run a test sending broadcast, multicast or unknown unicast packets between pods and see if they reach the other side (e.g. with a tool like Scapy and packet captures or ELAM captures in the remote pod).

View solution in original post

Dan Verzaal · ‎05-22-2018

I also agree. multicast in the IPN was my first thought and we've seen this type of behavior pod to pod when mcast is broken in the IPN.

CCIE RS 34827

View solution in original post

richmond · ‎05-18-2018

It sounds like the multicast in the Inter-Pod Network (IPN) may not be functioning correctly.

When you set unknown unicast to flood and ARP to flood, these frames are encapsulated in VXLAN with the Bridge Domain GIPO multicast address as destination.

If multicast in the IPN is broken then these packets will be lost between pods.

When you switch to Unknown Unicast Proxy then they are no longer multicast between pods. Provided you have endpoints learnt in COOP then the spine will unicast forward packets to the leaf VTEP where it needs to go. This would explain why communication works when you turn off flood. What it doesn't explain is why the servers in the remote pod can connect to the outside world with ARP flooding disabled, unless you have an IP configured on the Bridge Domain.

With proxy mode there are scenarios where packet drops can occur. If the bridge domain is layer 2 and the gateway is outside of the fabric, when a host outside the fabric has not sent a frame that has been flooded in the legacy network to the border leaves, and an on-fabric host wants to talk to it the spine will drop these packets due to the entry being missing in COOP. If you have ARP flooding enabled then you will likely learn the MAC address at layer 2 due to the ARP packets being flooded and the host outside the fabric responding. However if you have a system that does not use ARP then you will have communication issues unless the unknown unicast forwarding is also set to flood.

ARP will break if the bridge domain has no subnet configured unless you have the flood option for ARP turned on. Without an IP on the bridge domain the spine cannot send ARP glean messages (ARP on behalf of the host) for unknown endpoints.

When the gateway moves to the fabric you no longer require these settings except in corner cases.

See here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/ACI_Best_Practices/b_ACI_Best_Practices/b_ACI_Best_Practices_chapter_010.html

I would check your multicast in the IPN. Run a test sending broadcast, multicast or unknown unicast packets between pods and see if they reach the other side (e.g. with a tool like Scapy and packet captures or ELAM captures in the remote pod).

Nik Noltenius · ‎05-22-2018

Hello Richmond,

thank you for the detailed reply. I haven't been able to check the multicast yet but will definitely do so.

I'll update this discussion when I get any results.

Dan Verzaal · ‎05-22-2018

I also agree. multicast in the IPN was my first thought and we've seen this type of behavior pod to pod when mcast is broken in the IPN.

CCIE RS 34827

Nik Noltenius · ‎05-25-2018

Dan, Richmond,

unfortunately I haven't been able to do any real testing because there are some logistical dependencies. However I checked all config and issued any Multicast and PIM show commands I was able to find.

From my - admittedly limited - knowledge of PIM and IGMP nothing seems to be obviously wrong. There are group-entries, icmp-joins, a couple of routes etc.

Do you maybe have some "this is how it should look like" outputs that I can compare my results to?

Anyway, I'm still following the test approach you suggested.

Kind regards,

Nik

Nik Noltenius · ‎05-29-2018

Hi everyone,

to sum this up: It was indeed a problem with Multicast in the IPN. It turned out we were using a subnet address for the RP which shouldn't be a problem with a /31 on the first router but definitely was a problem with the longer subnet masks on the other IPN devices.

Thanks for the help and best regards

Nik