08-13-2014 08:03 AM - edited 03-07-2019 08:22 PM
Hi there,
We currently have a number of 2960S (12.2.55) access switches that connect to Dell N3000 series distribution switches.
We have found a problem with spanning tree shutting down ports and the switches going offline. This happens intermittantly and doesn't seem to conincide with anything in particular.
Looking at the log on the ciscos we are getting the below...
Apr 10 07:01:16.463: %LINK-3-UPDOWN: Interface Port-channel1, changed state to down
Apr 10 07:01:16.463: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/50, changed state to down
Apr 10 07:01:22.125: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Gi1/0/51, putting Gi1/0/51 in err-disable state
Apr 10 07:01:22.172: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Gi1/0/52, putting Gi1/0/52 in err-disable state
Apr 10 07:01:22.230: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po2, putting Gi1/0/51 in err-disable state
Apr 10 07:01:22.230: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po2, putting Gi1/0/52 in err-disable state
Apr 10 07:01:22.230: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po2, putting Po2 in err-disable state
I have attached the configs of the access and distribution switch.
Note the Ether channel is on ports 1/0/49 & 1/0/50, 1/0/51 & 1/0/52 on the Cisco.
The channel-group on the dell is on 2/0/1 and 2/0/2
I assume there is a Spanning Tree setup error here but I'm not sure what the problem is. Can anyone help?
-Huw
Solved! Go to Solution.
08-14-2014 06:19 AM
Hi guys,
Please allow me to join.
Huwy, the technical reason for the err-disable issue you are having on your Catalyst switch is this: The Catalyst is receiving BPDUs from different source MAC addresses over the Gi1/0/51 or Gi1/0/52 ports that are bundled in the Po2 EtherChannel. In normal operation, all physical links of an EtherChannel bundle are represented by a single Port-channel interface to STP, and that interface in turn has a single MAC address. Whenever STP sends a BPDU over a Port-channel interface, it uses that interface's MAC address as the source MAC. The result is that regardless of how many physical links are in an EtherChannel bundle and what particular physical link the BPDU is transmitted through, the source MAC address will always be the same - and only one for the whole duration of the Port-channel interface existence. If, however, the switch receives BPDUs sent from differing source MAC addresses, it is a good indication that the sending device does not consider the ports to be bundled in an EtherChannel, and that most probably, some EtherChannel configuration mismatch exists. In these cases, the EtherChannel misconfig guard kicks in and err-disables the entire EtherChannel.
The question is: why would the Dell occassionally send BPDUs with differing source MAC addresses if it continuously considers the ports to be bundled? Are there any indications on the Dell that it intermittently considers the ports to be unbundled, perhaps due to LACP signalling problems? The unbundling would happen before the err-disabling takes place.
Best regards,
Peter
08-13-2014 08:54 AM
Configure channel mode to desirable .
http://www.cisco.com/c/en/us/support/docs/lan-switching/etherchannel/12033-89.html
A common issue during EtherChannel configuration is that the interfaces go into err-disable mode. This can be seen when Etherchannel is switched to the ON mode in one switch, and the other switch is not configured immediately. If left in this state for a minute or so, STP on the switch where EtherChannel is enabled thinks there is a loop. This causes the channeling ports to be put inerr-disable state. See this example for more information on how to determine if your EtherChannel interfaces are in the err-disable state:
%SPANTREE-2-CHNL_MISCFG: Detected loop due to etherchannel misconfiguration of Gi0/9 %PM-4-ERR_DISABLE: channel-misconfig error detected on Po10, putting Gi0/9 in err-disable state %PM-4-ERR_DISABLE: channel-misconfig error detected on Po10, putting Gi0/10 in err-disable state
Switch1#show etherchannel summary Flags: D - down P - in port-channel I - stand-alone s - suspended H - Hot-standby (LACP only) R - Layer3 S - Layer2 u - unsuitable for bundling U - in use f - failed to allocate aggregator d - default port Number of channel-groups in use: 1 Number of aggregators: 1 Group Port-channel Protocol Ports ------+-------------+-----------+----------------------------------------------- 10 Po10(SD) - Gi0/9(D) Gi0/10(D) Switch1#show interfaces GigabitEthernet 0/9 status Port Name Status Vlan Duplex Speed Type Gi0/9 err-disabled 1 auto auto 10/100/1000BaseTX Switch1#show interfaces GigabitEthernet 0/10 status Port Name Status Vlan Duplex Speed Type Gi0/10 err-disabled 1 auto auto 10/100/1000BaseTX
The error message states that the EtherChannel encountered a spanning tree loop. In order to resolve the issue, set the channel mode to desirable on both sides of the connection, and then re-enable the interfaces:
Switch1#configure terminal Enter configuration commands, one per line. End with CNTL/Z. Switch1(config)#interface gi0/9 Switch1(config-if)#channel-group 10 mode desirable
08-13-2014 04:33 PM
Hey Huw,
What is the spaning tree mode supported by DELL switch?
Also regarding the error message, on Cisco switch when ports are bundled together the etherchannel protocol is sensing that other device is unable to bundle the ports and treating this logical link as two seperate links hence to avoid any STP loop the Cisco switch is error disabling the port-channel.
You may use ON mode but there are chances of potential STP loop with that so i will not suggest you to go with it; rather check the STP mode supported on Dell box and proceed accordingly.
HTH.
Regards,
RS.
08-14-2014 01:16 AM
Thanks both for coming back.
There don't appear to be any spanning tree commands (apart from portfast) in the config of the Dell. Is this the problem? It appears to be running 802.1w (see below)
DELLDIST1#show spanning-tree summary
Spanning Tree Adminmode........... Enabled
Spanning Tree Version............. IEEE 802.1w
BPDU Guard Mode................... Disabled
BPDU Flood Mode................... Disabled
BPDU Filter Mode.................. Disabled
Configuration Name................ F8-B1-56-33-0C-9B
Configuration Revision Level...... 0
Configuration Digest Key.......... 0xac36177f5XXXXXXXd8ab26de62
Configuration Format Selector..... 0
SWC_DIST_1#show spanning-tree
Spanning tree :Enabled - BPDU Flooding :Disabled - Portfast BPDU filtering :Disabled - mode :rstp
CST Regional Root: 80:00:F8:B1:56:33:0C:9B
Regional Root Path Cost: 0
ROOT ID
Priority 0
Address 90B1.1CF4.AA70
Path Cost 500
Root Port Po1
Hello Time 2 Sec Max Age 20 sec Forward Delay 15 sec TxHoldCount 6 sec
Bridge Max Hops 20
Bridge ID
Priority 32768
Address F8B1.XXXX.0C9B
Hello Time 2 Sec Max Age 20 sec Forward Delay 15 sec
Interfaces
Name State Prio.Nbr Cost Sts Role Restricted
--------- -------- --------- --------- ---- ----- ----------
Gi1/0/1 Enabled 128.1 20000 FWD Desg No
Gi1/0/2 Enabled 128.2 200000 FWD Desg No
Gi1/0/3 Enabled 128.3 200000 FWD Desg No
<snip>
08-14-2014 06:19 AM
Hi guys,
Please allow me to join.
Huwy, the technical reason for the err-disable issue you are having on your Catalyst switch is this: The Catalyst is receiving BPDUs from different source MAC addresses over the Gi1/0/51 or Gi1/0/52 ports that are bundled in the Po2 EtherChannel. In normal operation, all physical links of an EtherChannel bundle are represented by a single Port-channel interface to STP, and that interface in turn has a single MAC address. Whenever STP sends a BPDU over a Port-channel interface, it uses that interface's MAC address as the source MAC. The result is that regardless of how many physical links are in an EtherChannel bundle and what particular physical link the BPDU is transmitted through, the source MAC address will always be the same - and only one for the whole duration of the Port-channel interface existence. If, however, the switch receives BPDUs sent from differing source MAC addresses, it is a good indication that the sending device does not consider the ports to be bundled in an EtherChannel, and that most probably, some EtherChannel configuration mismatch exists. In these cases, the EtherChannel misconfig guard kicks in and err-disables the entire EtherChannel.
The question is: why would the Dell occassionally send BPDUs with differing source MAC addresses if it continuously considers the ports to be bundled? Are there any indications on the Dell that it intermittently considers the ports to be unbundled, perhaps due to LACP signalling problems? The unbundling would happen before the err-disabling takes place.
Best regards,
Peter
08-14-2014 08:03 AM
Hi Peter,
Thanks for your response - that explains it very clearly.
We are currently using 2x Dell S600s as our core. These are configured with VLT so they appear as one logical switch. Could this be the reason behind this?
I have attached a diagram showing the simplified structure of the switches
-Huw
08-14-2014 08:03 AM
Hi Peter,
Having read your response again it sounds like the issue would just be due to the etherchannel configuration on the dell n3000.
From a Dell document on interoperability ...(http://en.community.dell.com/techcenter/networking/m/networking_files/20440256.aspx) the only difference I can see is that I haven't got the below command under the port channel
C6504(config-if)#switchport trunk encapsulation dot1q
Do you think this is likely to make a difference?
Cheers,
H
08-14-2014 01:36 PM
Hi Huw,
Most probably, the problem is on the N3000 switches.
Your switch_structure.jpg diagram shows that a single Cat2960-S is connected to two N3000 switches. I just want to make sure that you have not bundled links going to different N3000 switches into a single EtherChannel - that would be incorrect, although it would be the possible cause of the problem you are seeing.
The switchport trunk encapsulation dot1q command is not supported on your Catalyst switches because the dot1q encapsulation is the only trunk encapsulation they support. This command is supported on higher switches, such as 3560, 4500 or 6500 that still support the ISL trunking protocol, so this command allows to choose the particular trunk encapsulation. On the Cat2960, however, this command is not available because it is not needed - the trunk are already running dot1q.
What kind of STP are you running in your network? Is it consistently STP, RSTP, MST, PVST+ or RPVST+ on all switches? Especially with Dell switches, are you running their own per-VLAN version of STP?
This issue is becoming very interesting.
Best regards,
Peter
08-14-2014 02:12 PM
Hi Peter,
I've just run a show cdp neigh command on the cisco access switch and can confirm that the LAGs are correct (there are 2 LAGs each going to a single switch).
The core, distribution and access switches all seem to have spanning tree in rstp mode. See the show spanning tree output above from the dell
Can you suggest how I can troubleshoot this further. Its looking like a very difficult problem!
As a workaround could the "etherchannel guard" be disabled on the cisco?
Cheers,
Huw
c2960s#sh spanning-tree
VLAN0001
Spanning tree enabled protocol rstp
Root ID Priority 0
Address 90b1.1cf4.aa70
Cost 503
Port 224 (Port-channel1)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32769 (priority 32768 sys-id-ext 1)
Address c8f9.f9b2.0f00
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 300 sec
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1 Desg FWD 4 128.1 P2p
Gi1/0/2 Desg FWD 4 128.2 P2p
Gi1/0/4 Desg FWD 4 128.4 P2p
Gi1/0/36 Desg FWD 19 128.36 P2p
Po1 Root FWD 3 128.224 P2p
Po2 Desg FWD 3 128.232 P2p
08-15-2014 02:45 AM
Hi Huw,
The above output from Dell shows that the switch is running 802.1w (RSTP) but it is not clear whether that is plain RSTP or Dell's RSTP-PV. Please try running the show spanning-tree vlan 1 on your Dell switches and check whether the output contains the following lines:
VLAN 1 Spanning-tree enabled protocol rpvst
Feel free to substitute the VLAN 1 for any other existing VLAN.
What switch is the root switch, by the way? Is it the VLT pair, or any other switch? Can you make sure on the Cat2960 switches using show spanning-tree root command that they both report the same - and correct - root switch for all VLANs?
Deactivating the EtherChannel Misconfig Guard can be performed on your Cat2960 switches using the following global config mode command:
no spanning-tree etherchannel guard misconfig
In any case, if we want to narrow down the problem, we need more information about the processes taking place in the moment of err-disabling the ports. Does your network operation guidelines allow you to run debugs on your switches? If they do, would you mind keeping the EtherChannel Misconfig Guard activated and having these debugs activated on your Cat2960?
debug spanning-tree etherchannel
debug spanning-tree exceptions
debug spanning-tree switch errors
!
debug spanning-tree events
debug spanning-tree general
The first three debugs should only produce output when error conditions ensue; the second two debugs produce very minimal output in the stable network and provide more context to the actual events that possible take place around the time of err-disabling the ports. These debugs should not negatively impair your switch operation - of course, as with all debugging, caution is always called for.
Best regards,
Peter
EDIT: Added the forgotten no keyword to the command for EtherChannel Misconfig Guard deactivation
08-15-2014 02:45 AM
Hi Peter,
below is the output for the commands you asked I run
Dell
DellDist1#show spanning-tree vlan 1
Neither PVST nor Rapid-PVST is enabled
DellDist1#show spanning-tree vlan 1681
Neither PVST nor Rapid-PVST is enabled
Cisco
C2960S#show spanning-tree root
Root Hello Max Fwd
Vlan Root ID Cost Time Age Dly Root Port
---------------- -------------------- --------- ----- --- --- ------------
VLAN0001 0 90b1.1cf4.aa70 503 2 20 15 Po1
VLAN1016 33784 3cce.73ba.9280 3 2 20 15 Po1
VLAN1112 33880 0017.e05e.2880 3 2 20 15 Po1
VLAN1128 33896 0017.e05e.2880 3 2 20 15 Po1
VLAN1136 33904 0017.e05e.2880 3 2 20 15 Po1
VLAN1144 33912 0017.e05e.2880 3 2 20 15 Po1
VLAN1152 33920 0017.e05e.2880 3 2 20 15 Po1
VLAN1681 34449 0017.e05e.2880 3 2 20 15 Po1
----------------------
I have enabled debugs. Will the output been shown via "show log"?
Cheers once again,
Huw
08-15-2014 06:10 AM
Hi Huw,
It seems as if the Dell switches were not running the per-VLAN RSTP variant. That would mean that your Dell switches are most probably running plain RSTP without VLAN support. I now have a hypothesis that this may be at the core of the issues you are seeing. Let me elaborate.
Cisco switches use per-VLAN STP (or RSTP) variants, running a separate STP instance for each VLAN. To provide interoperability on trunks, STP instance for VLAN1 sends and receives normal IEEE-compatible BPDUs and is therefore able to seamlessly interoperate with all other switches out there, even if they do not speak per-VLAN STP. In addition, all per-VLAN STP instances send and receive their per-VLAN BPDUs in a slightly changed format, and most importantly, addressed to a different group MAC address than normal IEEE BPDUs. To switches that do not support the per-VLAN STP, these per-VLAN BPDUs are just multicast frames flooded throughout the network and not processed by them.
This flooding of per-VLAN BPDUs by non-per-VLAN-STP-aware switches is crucial to understanding what could be happening in your network. From the viewpoint of a Cat2960 switch, BPDUs for VLAN1 arriving through a Port-channel are sourced by the immediate neighboring Dell switch. However, per-VLAN BPDUs arriving through the same Port-channel are sourced by some other Catalyst switch in your network, possibly several "switch hops" away, not by the Dell - the Dell just floods them. Now, if there is a topology change somewhere in your network that affects some other Catalyst switch or switches, they may start sending BPDUs from their different ports until the network stabilizes into a new loop-free topology. Because these BPDUs - quite possibly sourced from different ports on Catalyst switches - are just flooded across the Dell switches, they will all arrive over the same Port-channel, and they trick the EtherChannel Misconfig Guard into thinking that the opposite device is not treating the ports as bundled.
The output of the show spanning-tree root command from the Cat2960 suggests that most probably, the Dell switches do not run per-VLAN RSTP - otherwise, they would become root switches in all VLANs. You can see that while VLAN1 has a Dell-based root switch (based on the OUI of 90:b1:1c in the MAC address), all other VLANs use Cisco switches as their respective roots - and even more, VLAN1016 has a different root switch than the other VLANs. All VLANs that have Cisco switches as their root switch in addition have their root switch priority set to 32768 (subtract the VLAN number from the priority shown in the output). Clearly, this is the default configuration of a Cisco switch. These differences indeed suggest that the Dell switches are not running per-VLAN RSTP.
I believe you should start the RSTP-PV (as Dell calls it) on all your Dell switches. According to the interoperability document you have referenced, the Dell switches should be configured with:
spanning-tree mode rapid-pvst
All your Dell switches should be configured with this command. I recommend to perform this change over a maintenance window, as there may be a transient connectivity outage. In addition, make sure that a reasonable switch becomes the root switch for all VLANs. Perhaps the VLT switches should be configured for this. The command should be fairly simple:
spanning-tree vlan 1-4094 priority 0
Just in case you want to proceed with the debugs as well, the debugs will indeed be visible in the show logging command on the Catalyst - but I recommend configuring the buffer for the logs to be bigger and to store all messages including the debugs:
logging buffered 100000 debugging
This command on Catalyst switches will make sure there is 100 Kbytes reserved for the log buffer, and all messages including the debugs will be stored there.
I suggest starting the RSTP-PV on your Dell switches as the first step and seeing if the issue stops occurring.
Best regards,
Peter
08-18-2014 02:04 AM
Thanks again for the excellent response. I think we are going to disable the misconfig guard as its the simplest solution.
Just to be clear is the command to disable the etherchannel misconfig guard:-
no spanning-tree etherchannel guard misconfig
From what I can see:-
spanning-tree etherchannel guard misconfig
Enables the guard.
Cheers,
-H
08-18-2014 04:22 AM
Hello Huw,
I think we are going to disable the misconfig guard as its the simplest solution.
It may be the simplest solution but in the long run, it is definitely not the best solution. You are basically choosing a workaround for a more profound problem of running different versions of STP in a single switched topology. I understand that deactivating the EtherChannel Misconfig Guard may be the most straightforward solution for the moment but please consider also aligning the STP versions used in your network.
Ideally, I would recommend running MST as that is the preferred STP version for multivendor environments. It requires a bit of pre-planning but in the long run, it proves beneficial.
The least disruptive STP streamlining, however, would be to simply activate the RSTP-PV on your Dell switches. While requiring a maintenance window, it should be very straightforward to implement.
Please note that currently, you do not really have your switched environment under control. Just looking at the output of show spanning-tree root, it shows very nicely that in VLANs other than VLAN1, the per-VLAN RSTP is running based on the default (unconfigured) values, ending up with random Catalyst switches being root switches. In addition, there is no way you can make your Dell switches to be root switches in these VLANs unless you run the RSTP-PV on them. While your network works right now, it is not the way it should be left running for the foreseeable future.
So I encourage you to deactivate the EtherChannel Misconfig Guard for the time being to avoid the outages caused by it, but I also encourage you even more to align the STP versions running on your switches so that only a single STP version is run across your network.
Anyway, yes, you're right - the correct command to deactivate the EtherChannel Misconfig Guard is no spanning-tree etherchannel guard misconfig. I forgot to put in the no keyword in my previous responses - I've corrected them as well. Thanks!
Best regards,
Peter
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide