Solved: Etherchannel Flapping

VRAI · ‎02-24-2011

I have 2 Cisco 2960G Switches being used in an "active , passive" configuration. the switches are connected via 4 port etherchannel on each switch (total of 😎 Recently there have been some logs showing there is some "flapping" going on.This is some output:

1d: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.297c.1111 in vlan 100 is flapping between port Po3 and port Po6
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.2958.2222 in vlan 100 is flapping between port Po2 and port Po6
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.290d.3333 in vlan 100 is flapping between port Po2 and port Po6
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 0050.56a3.4444 in vlan 100 is flapping between port Po2 and port Po6
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 000c.295b.5555 in vlan 100 is flapping between port Po2 and port Po6
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 0050.56a3.6666 in vlan 100 is flapping between port Po3 and port Po6
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 0050.56a3.7777 in vlan 100 is flapping between port Po6 and port Po1
1d: %SW_MATM-4-MACFLAP_NOTIF: Host 0050.56a3.8888 in vlan 100 is flapping between port Po6 and port Po1

now Im fairly sure that the configs have been changed and is causing these issues. But I wanted a sanity check. I have attached the configs for both switches.

Also I would add that these switches are attached to ESX hosts using vmware in an ACTIVE PAssive config which is why they are in access mode

another thing that might be worth mentioning is that I am using inter-vlan routing

Router (802.1q)
|
Switch
| | | |
Switch
||||||||||
ESX Hosts

thanks for any help

Latchum Naidu · ‎02-25-2011

Hi,

Is there any config change on the switch end or ESX server end.

It is very important for Etherchannels that you figure out if both those ports are connected to the same EXS server or not.

The reason is simple, if they are connected to the same server (with dual NICs) what you see is some kind of load balancing mechanism done on server side (NIC teaming or so) for which both NICs are sending frames sourced by same MAC at the same time.

This is not recommended as you force the switch to learn the same MAC over and over. Since this operation is done in software by the CPU you can hit a High CPU condition too

By the checking those macaddresses they come from vmware, so they are virtual. You really need to work with the admistritor to figure out what he did.

Ask the administrator to use a different load balancing scheme. Ideally they should keep one port in standby while the other transmits. If they need both ports to transmit at the same time the traffic should be sourced from 2 different MAC or else you will always see the flapping. They can configure a unique multicast MAC address but you need to know what you do as you can have other problems depending on the platforms you have.

------

If they are connected to 2 different servers they are clearly misconfigured and vmware is on both of them but it is configured to use the same MAC address on the 3 vlans. Solution> tell the administror to change them (and to stop messing up with your network)

--------

If they are connected to 2 switches (it might still be as those ports are trunk even if cdp is disabled) you are having a loop on those vlans.
How to fix it: draw a topology of those vlans including all the bridges and their ports. check the status of each one and make sure that one (or multiple) are in BLK state and therefore you don't have some redundant path active. Use "show span vlan x detail" to see if TCNs are increasing.
If you have redundant path and you dont how how to troubleshoot STP issues go ahead and confire spanning tree loopguard on all your switches. If your gears all all cisco configure also UDLD.

Hope this will helps you..

Please rate the helpfull posts.
Regards,
Naidu.

View solution in original post

Latchum Naidu · ‎02-25-2011

Yes,

You need to sit with the ESX administrator and check the config like if there is any changes or mis config and etc.,

Please rate the helpfull posts.

Regards,

Naidu.

View solution in original post

andrewswanson · ‎02-25-2011

hello

have a look at the following Cisco/Vmware document:

http://www.cisco.com/application/pdf/en/us/guest/netsol/ns304/c649/ccmigration_09186a00807a15d0.pdf

there is a short chapter on "Using NIC Teaming for Connectivity Redundancy". i was wondering if your ESX is configured as either of the following:

Active/Active with Mac or Port Based loadbalancing method
in this scenario,

VM traffic will use one particular physical switch interface in the ESX NIC team (based on the VM's Mac address or Port Group) -
if that switch interface fails, VM traffic will then use another, but essentially traffic from one particular VM should always exit ESX by the same physical switch interface

or

Active/Active with IP Based loadbalancing method
in this scenario,

VM traffic will be loadbalanced across ALL the physical switch interfaces in the ESX NIC team (based on the VM traffic source/destination IP addresses)
for this to work properly, all physical switch ports in the ESX NIC Team should be in the SAME physical switch etherchannel.
if your ESX has teamed 2 or more physical switch etherchannels, ESX will load balance VM traffic across BOTH those etherchannels - in this case your switch will report that a host is "flapping" between your portchannels.

hope that makes sense - the document is well worth a read and will explain the above better.

hth
andy

View solution in original post

Latchum Naidu · ‎02-25-2011

Please remember to rate all the helpfull posts.

Regards,

Naidu.

View solution in original post

Latchum Naidu · ‎02-25-2011

OK,

Regarding the Etherchannel config at ESX host end, see the below discussion in vmware forum which may help you to make sure the settings at ESX end.
http://communities.vmware.com/thread/136547

And in your cisco device compare the config with below which from best practicies.

interface Port-channel1
description VMware ESX Adapter0 Network
no ip address
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,3
switchport mode trunk
switchport nonegotiate

!

interface GigabitEthernet1/1
description VMware ESX EtherChannel link 0
no ip address
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 16,94
switchport mode trunk
switchport nonegotiate
channel-group 1 mode on

!

interface GigabitEthernet1/2
description VMware ESX EtherChannel link 1
no ip address
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,3
switchport mode trunk
switchport nonegotiate
channel-group 1 mode on

Please rate the all helpfull posts.
Regards,
Naidu.

View solution in original post

andrewswanson · ‎02-25-2011

hello

i've modified one of my documents with some of your details (hope i got this correct):

esx01 vswitch2 uses vmnics 8,9,12 and 13

how are these vmnics teamed?

if you are using active/active ip based:

vm traffic leaving vswitch2 will be loadbalanced across all 4 vmnics - the result would be (from switch1) that the VM is flapping between po1 and po6.

have a chat with the vmware admin as too the nic teaming setup and loadbalancing method.

hth

andy

View solution in original post

andrewswanson · ‎02-25-2011

yes - all your vmnics are active and esx is using IP hash to loadbalance between them - so esx decides what interface to send vm traffic out based on the source/destination ip address of the traffic - this means vm traffic (from the same vm) will be spread out across all 4 vmnics so you will see that VM host 'flapping' on your switches.

hth

andy

View solution in original post

andrewswanson · ‎02-24-2011

hello

are you seeing the flapping on the switch connected to the ESX hosts?

if you are using Nic Teaming in ESX, which of your switch interfaces are in the team and what load balancing algorithm are you using for that team?

cheers
andy

Latchum Naidu · ‎02-25-2011

Hi,

Is there any config change on the switch end or ESX server end.

It is very important for Etherchannels that you figure out if both those ports are connected to the same EXS server or not.

The reason is simple, if they are connected to the same server (with dual NICs) what you see is some kind of load balancing mechanism done on server side (NIC teaming or so) for which both NICs are sending frames sourced by same MAC at the same time.

This is not recommended as you force the switch to learn the same MAC over and over. Since this operation is done in software by the CPU you can hit a High CPU condition too

By the checking those macaddresses they come from vmware, so they are virtual. You really need to work with the admistritor to figure out what he did.

Ask the administrator to use a different load balancing scheme. Ideally they should keep one port in standby while the other transmits. If they need both ports to transmit at the same time the traffic should be sourced from 2 different MAC or else you will always see the flapping. They can configure a unique multicast MAC address but you need to know what you do as you can have other problems depending on the platforms you have.

------

If they are connected to 2 different servers they are clearly misconfigured and vmware is on both of them but it is configured to use the same MAC address on the 3 vlans. Solution> tell the administror to change them (and to stop messing up with your network)

--------

If they are connected to 2 switches (it might still be as those ports are trunk even if cdp is disabled) you are having a loop on those vlans.
How to fix it: draw a topology of those vlans including all the bridges and their ports. check the status of each one and make sure that one (or multiple) are in BLK state and therefore you don't have some redundant path active. Use "show span vlan x detail" to see if TCNs are increasing.
If you have redundant path and you dont how how to troubleshoot STP issues go ahead and confire spanning tree loopguard on all your switches. If your gears all all cisco configure also UDLD.

Hope this will helps you..

Please rate the helpfull posts.
Regards,
Naidu.

VRAI · ‎02-25-2011

one side should be active and the other passive, thats how it was set up

VRAI · ‎02-25-2011

yep the flapping is on the switches connected on the ESX hosts.

The etherchannel mode is "on"

I have attached the switch configs in my original post (is this what you mean?)

Latchum Naidu · ‎02-25-2011

Yes,

You need to sit with the ESX administrator and check the config like if there is any changes or mis config and etc.,

Please rate the helpfull posts.

Regards,

Naidu.

andrewswanson · ‎02-25-2011

hello

have a look at the following Cisco/Vmware document:

http://www.cisco.com/application/pdf/en/us/guest/netsol/ns304/c649/ccmigration_09186a00807a15d0.pdf

there is a short chapter on "Using NIC Teaming for Connectivity Redundancy". i was wondering if your ESX is configured as either of the following:

Active/Active with Mac or Port Based loadbalancing method
in this scenario,

VM traffic will use one particular physical switch interface in the ESX NIC team (based on the VM's Mac address or Port Group) -
if that switch interface fails, VM traffic will then use another, but essentially traffic from one particular VM should always exit ESX by the same physical switch interface

or

Active/Active with IP Based loadbalancing method
in this scenario,

VM traffic will be loadbalanced across ALL the physical switch interfaces in the ESX NIC team (based on the VM traffic source/destination IP addresses)
for this to work properly, all physical switch ports in the ESX NIC Team should be in the SAME physical switch etherchannel.
if your ESX has teamed 2 or more physical switch etherchannels, ESX will load balance VM traffic across BOTH those etherchannels - in this case your switch will report that a host is "flapping" between your portchannels.

hope that makes sense - the document is well worth a read and will explain the above better.

hth
andy

VRAI · ‎02-25-2011

Thank you for this, looks promising.

The vmware admin guy asdures me they are set to active passive

On 25 Feb 2011 11:44, "andrewswanson" <

Latchum Naidu · ‎02-25-2011

Please remember to rate all the helpfull posts.

Regards,

Naidu.

VRAI · ‎02-25-2011

Not sure if this helps but this is what the config should look like on the esx hosts (attached spreadsheet)

Latchum Naidu · ‎02-25-2011

OK,

Regarding the Etherchannel config at ESX host end, see the below discussion in vmware forum which may help you to make sure the settings at ESX end.
http://communities.vmware.com/thread/136547

And in your cisco device compare the config with below which from best practicies.

interface Port-channel1
description VMware ESX Adapter0 Network
no ip address
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,3
switchport mode trunk
switchport nonegotiate

!

interface GigabitEthernet1/1
description VMware ESX EtherChannel link 0
no ip address
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 16,94
switchport mode trunk
switchport nonegotiate
channel-group 1 mode on

!

interface GigabitEthernet1/2
description VMware ESX EtherChannel link 1
no ip address
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 2,3
switchport mode trunk
switchport nonegotiate
channel-group 1 mode on

Please rate the all helpfull posts.
Regards,
Naidu.

andrewswanson · ‎02-25-2011

hello

i've modified one of my documents with some of your details (hope i got this correct):

esx01 vswitch2 uses vmnics 8,9,12 and 13

how are these vmnics teamed?

if you are using active/active ip based:

vm traffic leaving vswitch2 will be loadbalanced across all 4 vmnics - the result would be (from switch1) that the VM is flapping between po1 and po6.

have a chat with the vmware admin as too the nic teaming setup and loadbalancing method.

hth

andy

VRAI · ‎02-25-2011

Thanks for the help so far I have finaly managed to get a screen shot of how our esx network config (attached)

I hope they can help illistrate better than I have been able to do !

if they prove usefull , would you like me to include a screen shot of the other 2 ? (there are 3 ESX hosts all together)

VRAI · ‎02-25-2011

This next screen shot looks like it confirms that the esx nic teaming has been changed to Active/Active ???

is this the case (sorry I have no clue when it comes to VMware)

When I originaly configured the switches I was told by the VCP expert guy to use src-dst-ip , I take it that the above now shows a different config ?

andrewswanson · ‎02-25-2011

yes - all your vmnics are active and esx is using IP hash to loadbalance between them - so esx decides what interface to send vm traffic out based on the source/destination ip address of the traffic - this means vm traffic (from the same vm) will be spread out across all 4 vmnics so you will see that VM host 'flapping' on your switches.

hth

andy