05-24-2017 02:04 AM - edited 03-08-2019 10:42 AM
A single ASA5520 (FW) connecting to the Internet had configured with an EtherChannel to a switch stack (C2960S x2) in my site. The simplified network is shown in the diagram below.
This network design had been setup for a few years without problem and the corresponding resilient test had passed. The master stacked switch (SW1) was powered off in a resilient test, the slave switch (SW2) took over the workload. The traffic could be routed via SW2 between FW and the servers.
Recently, the master stacked switch (SW1) became dead in an incident. The slave switch (SW2) took over the workload, The servers could not access the Internet.
Is this a network design defect? Any remedial suggestions? Thank you very much.
Solved! Go to Solution.
05-25-2017 02:11 PM
Hi,
I have run into this issue before and discovered that the Cisco ASA does not support an Etherchannel with ports that are connected to separate switches in a stack.
"The ASA does not support connecting an EtherChannel to a switch stack. If the ASA EtherChannel is connected cross stack, and if the Master switch is powered down, then the EtherChannel connected to the remaining switch will not come up. "
My understanding is that the switch stack derives its LACP system-id from the stack MAC address of the stack master switch. If the stack master switch fails, the LACP system-id changes which the ASA doesn’t like and results in a failure of the Etherchannel.
This issue can be fixed by configuring ‘stack-mac persistent timer 0’ on the switch stack which forces the stack MAC and LACP system-id to stay the same when the stack master fails. Also I believe it’s possible to configure the Etherchannel to use unconditional mode (mode on) instead of LACP to fix the issue but I have not tested this myself.
There are a number of Cisco bug IDs created for this issue - https://bst.cloudapps.cisco.com/bugsearch/bug/CSCtw63011/?referring_site=bugquickviewredir
I hope that this helps.
05-26-2017 04:47 AM
Just to add that if you are not comfortable with the etherchannel solution or it does not work then you can use the ASA redundant interface feature which would allow you to connect the two ASA interfaces to different switches in the stack.
Assuming the connections are gigabit you should not see any performance issues as far as I am aware.
Just a suggestion.
Jon
05-25-2017 11:30 AM
Have you tested server fail-over to make sure if sw1 is gone the traffic can be forwarded to sw2 using the second NIC?
When sw1 fails, can the servers still ping the gateway. Is firewall the gateway for the server?
05-25-2017 06:56 PM
Hi Reza Sharifi,
Thank you for your reply.
(1) In the Resilient Test, the master unit (SW1) of stacked switch C2960S has been powered off. NIC Teaming monitor shows that the network traffic is routed from the Server to the secondary unit (SW2) of the stacked switch C2960S. The Client can ping and/or access the Server via the ASA5520 (FW) and the secondary unit (SW2) of the stacked switch C2960S.
(2) In an incident, the master unit (SW1) of stacked switch C2960S becomes failure. This failed switch looks like 'powered off". NIC Teaming monitor also shows that the network traffic is routed from the Server to the secondary unit (SW2) of the stacked switch C2960S. The Client can access the ASA5520 (FW), however, the Client cannot ping and/or access the Server. The Server cannot access the ASA5520 (FW).
05-25-2017 02:11 PM
Hi,
I have run into this issue before and discovered that the Cisco ASA does not support an Etherchannel with ports that are connected to separate switches in a stack.
"The ASA does not support connecting an EtherChannel to a switch stack. If the ASA EtherChannel is connected cross stack, and if the Master switch is powered down, then the EtherChannel connected to the remaining switch will not come up. "
My understanding is that the switch stack derives its LACP system-id from the stack MAC address of the stack master switch. If the stack master switch fails, the LACP system-id changes which the ASA doesn’t like and results in a failure of the Etherchannel.
This issue can be fixed by configuring ‘stack-mac persistent timer 0’ on the switch stack which forces the stack MAC and LACP system-id to stay the same when the stack master fails. Also I believe it’s possible to configure the Etherchannel to use unconditional mode (mode on) instead of LACP to fix the issue but I have not tested this myself.
There are a number of Cisco bug IDs created for this issue - https://bst.cloudapps.cisco.com/bugsearch/bug/CSCtw63011/?referring_site=bugquickviewredir
I hope that this helps.
05-25-2017 07:45 PM
Hi willwetherman,
Thank you for your detailed explanation about the (LACP) System-ID.
I would agree with your suggestion that configuring 'stack-mac persistent timer 0' would be a workaround, however, I have no idea about how to test this model as the faulty C2960S (SW1) had been replaced with a spare one by the maintenance vendor.
The resilient test was successful before the SW failure incident. The regression test and resilient test were also successful after the switch replacement. The fail-over is success in the resilient test but the fail-over is not success in the incident.
Since the resilient test is successful when SW1 is powered off, how the SW1 failure can be simulated with the configuration of 'stack-mac persistent timer 0' in the stacked switch?
Should the network design be changed to unstack switch? What is your remediation actions in your case?
05-26-2017 12:31 AM
Hi Joe,
That is an interesting issue. When I first encounter this issue, and every time since, the ASA Etherchannel always failed when the stack master was powered off. I'm unsure why this would be different in your scenario.
Can you confirm if the Etherchannel is using LACP or unconditional mode (mode on)? Also has the stack-mac persistent timer value been set to a non-zero value on the switch stack?
05-26-2017 02:14 AM
Hi willwetherman,
I learnt about the resilient fail-over test was successful from an old document.
No saved configuration cannot be found before this incident. Here is the configuration after this incident.
SW1#show switch
Switch/Stack Mac Address : 2c3e.cf5e.3380
H/W Current
Switch# Role Mac Address Priority Version State
----------------------------------------------------------
1 Member 0000.0000.0000 0 1 Removed
*2 Master 2c3e.cf5e.3380 1 1 Ready
Here is the configuration after the faulty switch replacement.
SW1#show switch
Switch/Stack Mac Address : 2c3e.cf5e.3380
H/W Current
Switch# Role Mac Address Priority Version State
----------------------------------------------------------
1 Member 2c3e.cf5e.2600 15 1 Ready
*2 Master 2c3e.cf5e.3380 14 1 Ready
"no stack-mac persistent timer" is used in the running configuration (extracted).
SW1#show running-config
......
switch 1 provision ws-c2960s-48td-l
switch 2 provision ws-c2960s-48td-l
!
!
Here is the ASA5520 running configuration.
FW# show firewall
Firewall mode: Transparent
FW# show failover
Failover Off
Failover unit Secondary
Failover LAN Interface: not Configured
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 3 of 160 maximum
FW# show lacp sys-id
32768,44d3.ca0e.e0b6
FW# show lacp internal
Flags: S - Device is requesting Slow LACPDUs
F - Device is requesting Fast LACPDUs
A - Device is in Active mode P - Device is in Passive mode
Channel group 2
LACP port Admin Oper Port Port
Port Flags State Priority Key Key Number State
-----------------------------------------------------------------------------
Gi0/1 SA bndl 32768 0x2 0x2 0x2 0x3d
Gi0/2 SA bndl 32768 0x2 0x2 0x3 0x3d
Thank you.
05-26-2017 02:27 AM
Hi,
Switch 2 is the current the stack master and the LACP system-id has been derived from switch 2's stack mac address. If you powered off switch 1 to test the resiliency then the stack MAC address would not have changed and the Etherchannel would have remained up. However, if switch 2 was powered off (the stack master) then the stack MAC would have changed and would have resulted in issues with the Etherchannel.
It is possible that switch 1 was not the stack master during the initial failover testing and was then was re-elected as the stack master between this testing and the incident. I can see in the first 'show switch' output that switch 2 was configured with the highest priority so it is very likely that switch 2 was the stack master when you tested?
05-27-2017 05:44 PM
Hi willwetherman,
You are correct that SW2 became the stack master when SW1 failed in the incident. The priorities of SW1 and SW2 had been set to 15 and 14 respectively after the replacement of SW1, however, SW2 remained as the stack master. There is no information about the priority of SW1 before the incident.
Then, a resilient test was conducted by powering off SW1. The network traffic was routed to SW2 as such traffic was monitoring at the NIC teaming GUI. Since SW2 was the stack master, thus the EtherChannel comes up at the SW2.
I would probably conduct another resilient test with SW1 as stack master to check what is happened. Thank you.
05-26-2017 04:47 AM
Just to add that if you are not comfortable with the etherchannel solution or it does not work then you can use the ASA redundant interface feature which would allow you to connect the two ASA interfaces to different switches in the stack.
Assuming the connections are gigabit you should not see any performance issues as far as I am aware.
Just a suggestion.
Jon
05-28-2017 06:19 PM
Hi Jon Marshall,
Sure. I think the redundant interface is another possible solution. The configuration for port-channel is extracted as follows.
FW#show config
!
errdisable recovery cause link-flap
errdisable recovery interval 60
port-channel load-balance src-dst-mac
!
vlan internal allocation policy ascending
!
!
!
interface Port-channel1
!
These settings had been implemented for many years. I would discuss about this option with my boss and the application vendor. Thanks.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide