cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
817
Views
0
Helpful
10
Replies

Is this a network design defect?

joechankc2013
Level 1
Level 1

A single ASA5520 (FW) connecting to the Internet had configured with an EtherChannel to a switch stack (C2960S x2) in my site. The simplified network is shown in the diagram below.

network_diagram

This network design had been setup for a few years without problem and the corresponding resilient test had passed.  The master stacked switch (SW1) was powered off in a resilient test, the slave switch (SW2) took over the workload.  The traffic could be routed via SW2 between FW and the servers.

Recently, the master stacked switch (SW1) became dead in an incident.  The slave switch (SW2) took over the workload,  The servers could not access the Internet.

Is this a network design defect?  Any remedial suggestions?  Thank you very much.

2 Accepted Solutions

Accepted Solutions

willwetherman
Spotlight
Spotlight

Hi,

I have run into this issue before and discovered that the Cisco ASA does not support an Etherchannel with ports that are connected to separate switches in a stack.

http://www.cisco.com/c/en/us/td/docs/security/asa/asa84/configuration/guide/asa_84_cli_config/interface_start.html#wp1329030

"The ASA does not support connecting an  EtherChannel to a switch stack. If the ASA EtherChannel is connected  cross stack, and if the Master switch is powered down, then the  EtherChannel connected to the remaining switch will not come up. "

My understanding is that the switch stack derives its LACP system-id from the stack MAC address of the stack master switch. If the stack master switch fails, the LACP system-id changes which the ASA doesn’t like and results in a failure of the Etherchannel.

This issue can be fixed by configuring ‘stack-mac persistent timer 0’ on the switch stack which forces the stack MAC and LACP system-id to stay the same when the stack master fails. Also I believe it’s possible to configure the Etherchannel to use unconditional mode (mode on) instead of LACP to fix the issue but I have not tested this myself.

There are a number of Cisco bug IDs created for this issue - https://bst.cloudapps.cisco.com/bugsearch/bug/CSCtw63011/?referring_site=bugquickviewredir

I hope that this helps.

View solution in original post

Jon Marshall
Hall of Fame
Hall of Fame

Just to add that if you are not comfortable with the etherchannel solution or it does not work then you can use the ASA redundant interface feature which would allow you to connect the two ASA interfaces to different switches in the stack.

Assuming the connections are gigabit you should not see any performance issues as far as I am aware.

Just a suggestion.

Jon

View solution in original post

10 Replies 10

Reza Sharifi
Hall of Fame
Hall of Fame

Have you tested server fail-over to make sure if sw1 is gone the traffic can be forwarded to sw2 using the second NIC?

When sw1 fails, can the servers still ping the gateway.  Is firewall the gateway for the server?

Hi Reza Sharifi,

Thank you for your reply. 

(1) In the Resilient Test, the master unit (SW1) of stacked switch C2960S has been powered off.  NIC Teaming monitor shows that the network traffic is routed from the Server to the secondary unit (SW2) of the stacked switch C2960S.  The Client can ping and/or access the Server via the ASA5520 (FW) and the secondary unit (SW2) of the stacked switch C2960S. 

(2) In an incident, the master unit (SW1) of stacked switch C2960S becomes failure.  This failed switch looks like 'powered off".  NIC Teaming monitor also shows that the network traffic is routed from the Server to the secondary unit (SW2) of the stacked switch C2960S. The Client can access the ASA5520 (FW), however, the Client cannot ping and/or access the Server.  The Server cannot access the ASA5520 (FW).

willwetherman
Spotlight
Spotlight

Hi,

I have run into this issue before and discovered that the Cisco ASA does not support an Etherchannel with ports that are connected to separate switches in a stack.

http://www.cisco.com/c/en/us/td/docs/security/asa/asa84/configuration/guide/asa_84_cli_config/interface_start.html#wp1329030

"The ASA does not support connecting an  EtherChannel to a switch stack. If the ASA EtherChannel is connected  cross stack, and if the Master switch is powered down, then the  EtherChannel connected to the remaining switch will not come up. "

My understanding is that the switch stack derives its LACP system-id from the stack MAC address of the stack master switch. If the stack master switch fails, the LACP system-id changes which the ASA doesn’t like and results in a failure of the Etherchannel.

This issue can be fixed by configuring ‘stack-mac persistent timer 0’ on the switch stack which forces the stack MAC and LACP system-id to stay the same when the stack master fails. Also I believe it’s possible to configure the Etherchannel to use unconditional mode (mode on) instead of LACP to fix the issue but I have not tested this myself.

There are a number of Cisco bug IDs created for this issue - https://bst.cloudapps.cisco.com/bugsearch/bug/CSCtw63011/?referring_site=bugquickviewredir

I hope that this helps.

Hi willwetherman,

Thank you for your detailed explanation about the (LACP) System-ID. 

I would agree with your suggestion that configuring 'stack-mac persistent timer 0' would be a workaround, however, I have no idea about how to test this model as the faulty C2960S (SW1) had been replaced with a spare one by the maintenance vendor.

The resilient test was successful before the SW failure incident.  The regression test and resilient test were also successful after the switch replacement.  The fail-over is success in the resilient test but the fail-over is not success in the incident.

Since the resilient test is successful when SW1 is powered off, how the SW1 failure can be simulated with the configuration of 'stack-mac persistent timer 0' in the stacked switch?

Should the network design be changed to unstack switch?  What is your remediation actions in your case?

Hi Joe,

 

That is an interesting issue. When I first encounter this issue, and every time since, the ASA Etherchannel always failed when the stack master was powered off. I'm unsure why this would be different in your scenario.

Can you confirm if the Etherchannel is using LACP or unconditional mode (mode on)? Also has the stack-mac persistent timer value been set to a non-zero value on the switch stack?

Hi willwetherman,

I learnt about the resilient fail-over test was successful from an old document.

No saved configuration cannot be found before this incident.  Here is the configuration after this incident.

SW1#show switch
Switch/Stack Mac Address : 2c3e.cf5e.3380
                                           H/W   Current
Switch#  Role   Mac Address     Priority Version  State
----------------------------------------------------------
 1       Member 0000.0000.0000     0      1       Removed
*2       Master 2c3e.cf5e.3380     1      1       Ready

Here is the configuration after the faulty switch replacement.

SW1#show switch
Switch/Stack Mac Address : 2c3e.cf5e.3380
                                           H/W   Current
Switch#  Role   Mac Address     Priority Version  State
----------------------------------------------------------
 1       Member 2c3e.cf5e.2600     15     1       Ready
*2       Master 2c3e.cf5e.3380     14     1       Ready

"no stack-mac persistent timer" is used in the running configuration (extracted).

SW1#show running-config

......

switch 1 provision ws-c2960s-48td-l
switch 2 provision ws-c2960s-48td-l
!
!

Here is the ASA5520 running configuration.

FW# show firewall
Firewall mode: Transparent

FW# show failover
Failover Off
Failover unit Secondary
Failover LAN Interface: not Configured
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 3 of 160 maximum

FW# show lacp sys-id
32768,44d3.ca0e.e0b6

FW# show lacp internal

Flags:  S - Device is requesting Slow LACPDUs
        F - Device is requesting Fast LACPDUs
        A - Device is in Active mode       P - Device is in Passive mode

Channel group 2
                             LACP port     Admin     Oper    Port        Port
Port      Flags   State      Priority      Key       Key     Number      State
-----------------------------------------------------------------------------
Gi0/1     SA      bndl       32768         0x2       0x2     0x2         0x3d
Gi0/2     SA      bndl       32768         0x2       0x2     0x3         0x3d

Thank you.

Hi,

Switch 2 is the current the stack master and the LACP system-id has been derived from switch 2's stack mac address. If you powered off switch 1 to test the resiliency then the stack MAC address would not have changed and the Etherchannel would have remained up. However, if switch 2 was powered off (the stack master) then the stack MAC would have changed and would have resulted in issues with the Etherchannel.

It is possible that switch 1 was not the stack master during the initial failover testing and was then was re-elected as the stack master between this testing and the incident. I can see in the first 'show switch' output that switch 2 was configured with the highest priority so it is very likely that switch 2 was the stack master when you tested?

Hi willwetherman,

You are correct that SW2 became the stack master when SW1 failed in the incident.  The priorities of SW1 and SW2 had been set to 15 and 14 respectively after the replacement of SW1, however, SW2 remained as the stack master.  There is no information about the priority of SW1 before the incident.

Then, a resilient test was conducted by powering off SW1.  The network traffic was routed to SW2 as such traffic was monitoring at the NIC teaming GUI.  Since SW2 was the stack master, thus the EtherChannel comes up at the SW2.

I would probably conduct another resilient test with SW1 as stack master to check what is happened.  Thank you.

Jon Marshall
Hall of Fame
Hall of Fame

Just to add that if you are not comfortable with the etherchannel solution or it does not work then you can use the ASA redundant interface feature which would allow you to connect the two ASA interfaces to different switches in the stack.

Assuming the connections are gigabit you should not see any performance issues as far as I am aware.

Just a suggestion.

Jon

Hi Jon Marshall,

Sure.  I think the redundant interface is another possible solution.  The configuration for port-channel is extracted as follows.

FW#show config

!

errdisable recovery cause link-flap
errdisable recovery interval 60
port-channel load-balance src-dst-mac
!
vlan internal allocation policy ascending
!
!
!
interface Port-channel1
!

These settings had been implemented for many years.  I would discuss about this option with my boss and the application vendor.  Thanks.