04-18-2014 09:23 AM
We had two 3120 blade switches fail on us in our production environment and when we plugged in the RMA replacements, it caused a loop that took down our VM environment:
Apr 15 17:33:11: %SW_MATM-4-MACFLAP_NOTIF: Host xxxx.56a4.0090 in vlan 2224 is flapping between port Po14 and port Po30
Apr 15 17:33:17: %SW_MATM-4-MACFLAP_NOTIF: Host xxxx.56a4.0095 in vlan 2224 is flapping between port Po16 and port Po32
Apr 15 17:33:19: %SW_MATM-4-MACFLAP_NOTIF: Hostxxxx.56ad.16b6 in vlan 2054 is flapping between port Po14 and port Po30
Apr 15 17:33:19: %SW_MATM-4-MACFLAP_NOTIF: Host xxxx.56a4.008c in vlan 2224 is flapping between port Po15 and port Po31
Apr 15 17:33:21: %SW_MATM-4-MACFLAP_NOTIF: Host xxxx.56ad.6264 in vlan 2054 is flapping between port Po16 and port Po32
Apr 15 17:33:23: %SW_MATM-4-MACFLAP_NOTIF: Host xxxx.56ad.522e in vlan 2055 is flapping between port Po42 and port Po26
Has anyone else experienced this? In Vsphere the Nics are set up for 'IP hash' and the the OS team claims nothing has changed in their configuration. We have our blade switches plugged in to an older HP C7000 chassis that has had a number of blade failures of late; both server and switch. I'm suspecting this is a chassis issue.
Thanks!
-anne
04-19-2014 12:41 AM
Hi Anne,
I suspect the loop you saw may have been caused by the fact that on a new switch i.e., one with no configuration, all switchports will be part of the same default VLAN. If your VMware servers are using route based on IP hash i.e., port-channels, then they will balance traffic from a single VM across both their NICs. If your Cisco switches had the default configuration the connections to the servers would not have been configured as port-channels, and therefore they would have seen the same MAC flapping between two ports.
Are you able to supply some additional information as to how things are connected. For example:
- Confirm it is the VMware ESX hosts in the blade chassis that are configured with IP hash?
- Are the Catalyst 3120 switches configured as a Virtual Blade Switch?
- The connectivity from the blade switches to the external network and the configuration of that network
Also, the two switches that were replaced:
- Were these both part of the same chassis?
- What was the process to replace them e.g., plugged in, all external cables connected and then configured?
Regards
04-22-2014 08:51 AM
Are you able to supply some additional information as to how things are connected. For example:
- Confirm it is the VMware ESX hosts in the blade chassis that are configured with IP hash?
*yes they are configured with IP hash except for 2 nics which they have x'd out...meaning they are disabled.
- Are the Catalyst 3120 switches configured as a Virtual Blade Switch?
*I don't know, how would I know that?
- The connectivity from the blade switches to the external network and the configuration of that network
*Each switch stack has 2 10Gig uplinks to Nexus 5K's that were not involved in the mac-flapping. The loop was between the VM's...
Also, the two switches that were replaced:
- Were these both part of the same chassis?
*They were on separate chassis.
- What was the process to replace them e.g., plugged in, all external cables connected and then configured?
*There were no external cables on these switches except for the stacking cables...no uplinks.
06-30-2014 01:19 PM
we have not scheduled an outage to further investigate this, however we believe it's an issue with us not using LACP on the trunks to the vm's. When the environment was set up, LACP wasn't an option and we had to set up the port channels to 'on' instead of 'active'....Now that we've upgraded our non-prod ESXi environment from 4.x to 5.5, we are able to use LACP on the trunks and that should resolve the loop.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide