cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1029
Views
0
Helpful
4
Replies

EPG vMAC sent by Leaf in ARP Reply or GARP instead L3OUT vMAC

guillerm
Level 1
Level 1

Hello,
we have an ACI Dual-Fabric environnement (version 12.0(2l) :

1 ACI Fabric per DC, both fabrics are interconnected by layer 2 vPCs (DCI) between dedicated Leafs on each DC
all EPG VLANs extended between both DCs
L3OUT VLANs to outside world not extended : L3OUT VLANs are extended between both DCs via the 2 Cat6Ks where are connected the L3OUT links

On all BDs (1BD per EPG), 1 distinct physical MAC (pMAC) is configured on each DC and 1 shared virtual MAC (vMAC) on both DCs :
same pMACs, vMAC for all BDs :
for each BD, 1 distinct IP is associated to each of those 3 BDs MACs : 2 physical IPs and 1 virtual IP

for example :
BD pMAC = 00:22:BD:F8:19:F1 on DC1 Fabric, and 00:22:BD:F8:19:F2 on DC2 Fabric
BD vMAC = 00:22:BD:F8:19:F0 on both DC Fabrics


On L3OUT, 1 shared vMAC is configured on both DCs : same vMAC on all L3OUT connected to outside active/standby Firewalls via Cat6K (1 per DC)
L3OUT vMAC = 00:22:BD:F8:19:FF on both DC Fabrics
5 distincts IPs are defined per L3OUT : 1 per leaf used to connect the local Cat6K (2 leafs used in vPC) and 1 Virtual IP shared between both DCs


We currently have observed instability related to those MACs when, for example, 1 PC from outside the fabrics communicate (via RDP, SSH, Pings, ...) with VM servers connected to any of both fabrics;
This problem is easily reproducible by just pinging the outside Firewall facing the ACI leafs , via the associated L3OUT :
a) if an iping is done from an ACI leaf towards this Firewall (defined as the default gateway on the L3OUT config), without specifying any source IP, there is no problem
b) when doing the same iping to the Firewall, by specifying a source IP with one of the ACI EPG IP, every 1 or 2 mins, pings stop  working for about 20 secs;

when taking a monitor session capture trace on both Cat6K on all concerned Portchannels (the one to the outside FW, the one to the local ACI, and the one to the Cat6K in the other DC), it appears that, regularly, GARP and ARP replies are sent by both ACIs towards their local Cat6K with either the L3OUT vMAC as Source MAC (which seems logical since they flow over an L3OUT segment)  or the BD vMAC 00:22:BD:F8:19:F0 (which seems less logical since the L3OUT is not configured with this vMAC at all)
the ACI IP presented in those GARP and ARP Replies are always the correct Virtual IP defined on the L3OUT on both DCs  

and, according to the tests performed, it is clear that ipings start to fail when the MAC presented by the ACI in the GARP/ARP Replies change;

Any idea woud be welcome to understand why this annoying phenomenum occurs

4 Replies 4

Marcel Zehnder
Spotlight
Spotlight

Hi

Long story short: Dual Fabric with stretched firewall cluster is a pain. Have a look at the following documents, it describes how to implement this setup:

White Paper/Design Guide:

http://www.cisco.com/c/en/us/solutions/data-center-virtualization/application-centric-infrastructure/white-paper-c11-737077.pdf

Video:

https://www.youtube.com/watch?v=Qn5Ki5SviEA

HTH

Marcel

Hello Marcel,

thanks for your update;

we had already reviewed the design guide you mentionned and had tried to understand as much as we could

Well, in our case, we don't use a Firewall Cluster as described in the guide and in the video :

we simply used an active/standby FW config (1 per DC) and only static routing : only 1 FW is active at a time;

and we don't attach the Firewalls to an EPG extended between both DCs, as described in the video;

but we attach those Firewalls to a specific L3OUT on each DC : same VLAN for this L3OUT on both DCs, but the VLAN is not extended between both DCs thru the DCI links between both DC ACIS, since this is not allowed ; this VLAN is extended on the Firewalls side via the L2 switches located between the ACI Leaves and the FW on each DC

so, it is less complex, and this design was validated by CISCO and worked properly for weeks on some pilot environment

and now, for no appearant reason, we got this really annoying problem only on 1 tenant ...

we checked all parameters (vlans, IP, flood, contracts, ..) between the failing tenant and the good ones : nothing different detected on ACI

quite strange

Yes it's strange - However we noticed similar strange issues in a dual fabric scenario. In the end we migrated to a multipod-fabric... ...and all problems were solved.

Joseph Ristaino
Cisco Employee
Cisco Employee

To summarize, TAC and BU are investigating.  A Bug was filed and is being looked at:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvd22682/?reffering_site=dumpcr

Joey

Review Cisco Networking for a $25 gift card

Save 25% on Day-2 Operations Add-On License