cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
30553
Views
2
Helpful
13
Replies

Spanning tree problem

huwyhuwy123
Level 1
Level 1

Hi there,

We currently have a number of 2960S (12.2.55) access switches that connect to Dell N3000 series distribution switches.

We have found a problem with spanning tree shutting down ports and the switches going offline. This happens intermittantly and doesn't seem to conincide with anything in particular.

Looking at the log on the ciscos we are getting the below...

Apr 10 07:01:16.463: %LINK-3-UPDOWN: Interface Port-channel1, changed state to down
Apr 10 07:01:16.463: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/50, changed state to down
Apr 10 07:01:22.125: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Gi1/0/51, putting Gi1/0/51 in err-disable state
Apr 10 07:01:22.172: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Gi1/0/52, putting Gi1/0/52 in err-disable state
Apr 10 07:01:22.230: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po2, putting Gi1/0/51 in err-disable state
Apr 10 07:01:22.230: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po2, putting Gi1/0/52 in err-disable state
Apr 10 07:01:22.230: %PM-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po2, putting Po2 in err-disable state

I have attached the configs of the access and distribution switch.

Note the Ether channel is on ports 1/0/49 & 1/0/50, 1/0/51 & 1/0/52 on the Cisco.

The channel-group on the dell is on 2/0/1 and 2/0/2

I assume there is a Spanning Tree setup error here but I'm not sure what the problem is. Can anyone help?

-Huw

 

1 Accepted Solution

Accepted Solutions

Hi guys,

Please allow me to join.

Huwy, the technical reason for the err-disable issue you are having on your Catalyst switch is this: The Catalyst is receiving BPDUs from different source MAC addresses over the Gi1/0/51 or Gi1/0/52 ports that are bundled in the Po2 EtherChannel. In normal operation, all physical links of an EtherChannel bundle are represented by a single Port-channel interface to STP, and that interface in turn has a single MAC address. Whenever STP sends a BPDU over a Port-channel interface, it uses that interface's MAC address as the source MAC. The result is that regardless of how many physical links are in an EtherChannel bundle and what particular physical link the BPDU is transmitted through, the source MAC address will always be the same - and only one for the whole duration of the Port-channel interface existence. If, however, the switch receives BPDUs sent from differing source MAC addresses, it is a good indication that the sending device does not consider the ports to be bundled in an EtherChannel, and that most probably, some EtherChannel configuration mismatch exists. In these cases, the EtherChannel misconfig guard kicks in and err-disables the entire EtherChannel.

The question is: why would the Dell occassionally send BPDUs with differing source MAC addresses if it continuously considers the ports to be bundled? Are there any indications on the Dell that it intermittently considers the ports to be unbundled, perhaps due to LACP signalling problems? The unbundling would happen before the err-disabling takes place.

Best regards,
Peter

View solution in original post

13 Replies 13

Hi,

Configure channel mode to desirable .

http://www.cisco.com/c/en/us/support/docs/lan-switching/etherchannel/12033-89.html

 

Err-Disable State

A common issue during EtherChannel configuration is that the interfaces go into err-disable mode. This can be seen when Etherchannel is switched to the ON mode in one switch, and the other switch is not configured immediately. If left in this state for a minute or so, STP on the switch where EtherChannel is enabled thinks there is a loop. This causes the channeling ports to be put inerr-disable state. See this example for more information on how to determine if your EtherChannel interfaces are in the err-disable state:

%SPANTREE-2-CHNL_MISCFG: Detected loop due to etherchannel misconfiguration of Gi0/9
%PM-4-ERR_DISABLE: channel-misconfig error detected on Po10, putting Gi0/9 in err-disable state
%PM-4-ERR_DISABLE: channel-misconfig error detected on Po10, putting Gi0/10 in err-disable state
Switch1#show etherchannel summary
Flags:  D - down        P - in port-channel
        I - stand-alone s - suspended
        H - Hot-standby (LACP only)
        R - Layer3      S - Layer2
        u - unsuitable for bundling
        U - in use      f - failed to allocate aggregator
        d - default port

Number of channel-groups in use: 1
Number of aggregators:           1

Group  Port-channel  Protocol    Ports
------+-------------+-----------+-----------------------------------------------
10     Po10(SD)         -        Gi0/9(D)    Gi0/10(D)

Switch1#show interfaces GigabitEthernet 0/9 status

Port      Name               Status       Vlan       Duplex  Speed Type
Gi0/9                        err-disabled 1            auto   auto 10/100/1000BaseTX

Switch1#show interfaces GigabitEthernet 0/10 status

Port      Name               Status       Vlan       Duplex  Speed Type
Gi0/10                       err-disabled 1            auto   auto 10/100/1000BaseTX

The error message states that the EtherChannel encountered a spanning tree loop. In order to resolve the issue, set the channel mode to desirable on both sides of the connection, and then re-enable the interfaces:

Switch1#configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
Switch1(config)#interface gi0/9
Switch1(config-if)#channel-group 10 mode desirable

Rajeev Sharma
Cisco Employee
Cisco Employee

Hey Huw,

What is the spaning tree mode supported by DELL switch?

Also regarding the error message, on Cisco switch when ports are bundled together the etherchannel protocol is sensing that other device is unable to bundle the ports and treating this logical link as two seperate links hence to avoid any STP loop the Cisco switch is error disabling the port-channel.

You may use ON mode but there are chances of potential STP loop with that so i will not suggest you to go with it; rather check the STP mode supported on Dell box and proceed accordingly.

HTH.

Regards,

RS.

Thanks both for coming back.

 

There don't appear to be any spanning tree commands (apart from portfast) in the config of the Dell. Is this the problem? It appears to be running 802.1w (see below)

 

DELLDIST1#show spanning-tree summary

Spanning Tree Adminmode........... Enabled
Spanning Tree Version............. IEEE 802.1w
BPDU Guard Mode................... Disabled
BPDU Flood Mode................... Disabled

BPDU Filter Mode.................. Disabled
Configuration Name................ F8-B1-56-33-0C-9B
Configuration Revision Level...... 0
Configuration Digest Key.......... 0xac36177f5XXXXXXXd8ab26de62
Configuration Format Selector..... 0

SWC_DIST_1#show spanning-tree

Spanning tree :Enabled - BPDU Flooding :Disabled - Portfast BPDU filtering :Disabled - mode :rstp
CST Regional Root:        80:00:F8:B1:56:33:0C:9B
Regional Root Path Cost:  0
ROOT ID
              Priority        0
              Address         90B1.1CF4.AA70
              Path Cost       500
              Root Port       Po1
              Hello Time 2 Sec Max Age 20 sec Forward Delay 15 sec TxHoldCount 6 sec
              Bridge Max Hops 20
Bridge ID
              Priority        32768
              Address         F8B1.XXXX.0C9B
              Hello Time 2 Sec Max Age 20 sec Forward Delay 15 sec
Interfaces

Name      State    Prio.Nbr  Cost      Sts  Role  Restricted
--------- -------- --------- --------- ---- ----- ----------
Gi1/0/1   Enabled  128.1     20000     FWD  Desg  No
Gi1/0/2   Enabled  128.2     200000    FWD  Desg  No
Gi1/0/3   Enabled  128.3     200000    FWD  Desg  No

<snip>

 

 

Hi guys,

Please allow me to join.

Huwy, the technical reason for the err-disable issue you are having on your Catalyst switch is this: The Catalyst is receiving BPDUs from different source MAC addresses over the Gi1/0/51 or Gi1/0/52 ports that are bundled in the Po2 EtherChannel. In normal operation, all physical links of an EtherChannel bundle are represented by a single Port-channel interface to STP, and that interface in turn has a single MAC address. Whenever STP sends a BPDU over a Port-channel interface, it uses that interface's MAC address as the source MAC. The result is that regardless of how many physical links are in an EtherChannel bundle and what particular physical link the BPDU is transmitted through, the source MAC address will always be the same - and only one for the whole duration of the Port-channel interface existence. If, however, the switch receives BPDUs sent from differing source MAC addresses, it is a good indication that the sending device does not consider the ports to be bundled in an EtherChannel, and that most probably, some EtherChannel configuration mismatch exists. In these cases, the EtherChannel misconfig guard kicks in and err-disables the entire EtherChannel.

The question is: why would the Dell occassionally send BPDUs with differing source MAC addresses if it continuously considers the ports to be bundled? Are there any indications on the Dell that it intermittently considers the ports to be unbundled, perhaps due to LACP signalling problems? The unbundling would happen before the err-disabling takes place.

Best regards,
Peter

Hi Peter,

Thanks for your response - that explains it very clearly.

We are currently using 2x Dell S600s as our core. These are configured with VLT so they appear as one logical switch. Could this be the reason behind this?

I have attached a diagram showing the simplified structure of the switches

-Huw

 

Hi Peter,

Having read your response again it sounds like the issue would just be due to the etherchannel configuration on the dell n3000. 

From a Dell document on interoperability ...(http://en.community.dell.com/techcenter/networking/m/networking_files/20440256.aspx) the only difference I can see is that I haven't got the below command under the port channel 

C6504(config-if)#switchport trunk encapsulation dot1q 

Do you think this is likely to make a difference?

Cheers,

H

 

 

 

 

 

Hi Huw,

Most probably, the problem is on the N3000 switches.

Your switch_structure.jpg diagram shows that a single Cat2960-S is connected to two N3000 switches. I just want to make sure that you have not bundled links going to different N3000 switches into a single EtherChannel - that would be incorrect, although it would be the possible cause of the problem you are seeing.

The switchport trunk encapsulation dot1q command is not supported on your Catalyst switches because the dot1q encapsulation is the only trunk encapsulation they support. This command is supported on higher switches, such as 3560, 4500 or 6500 that still support the ISL trunking protocol, so this command allows to choose the particular trunk encapsulation. On the Cat2960, however, this command is not available because it is not needed - the trunk are already running dot1q.

What kind of STP are you running in your network? Is it consistently STP, RSTP, MST, PVST+ or RPVST+ on all switches? Especially with Dell switches, are you running their own per-VLAN version of STP?

This issue is becoming very interesting.

Best regards,
Peter

Hi Peter,

I've just run a show cdp neigh command on the cisco access switch and can confirm that the LAGs are correct (there are 2 LAGs each going to a single switch).

The core, distribution and access switches all seem to have spanning tree in rstp mode.  See the show spanning tree output above from the dell

Can you suggest how I can troubleshoot this further. Its looking like a very difficult problem!

As a workaround could the "etherchannel guard" be disabled on the cisco?

Cheers,

Huw

 

 

c2960s#sh spanning-tree

VLAN0001
  Spanning tree enabled protocol rstp
  Root ID    Priority    0
             Address     90b1.1cf4.aa70
             Cost        503
             Port        224 (Port-channel1)
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    32769  (priority 32768 sys-id-ext 1)
             Address     c8f9.f9b2.0f00
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg FWD 4         128.1    P2p
Gi1/0/2             Desg FWD 4         128.2    P2p
Gi1/0/4             Desg FWD 4         128.4    P2p
Gi1/0/36            Desg FWD 19        128.36   P2p
Po1                 Root FWD 3         128.224  P2p
Po2                 Desg FWD 3         128.232  P2p

 

 

 

 

Hi Huw,

The above output from Dell shows that the switch is running 802.1w (RSTP) but it is not clear whether that is plain RSTP or Dell's RSTP-PV.  Please try running the show spanning-tree vlan 1 on your Dell switches and check whether the output contains the following lines:

VLAN 1
 Spanning-tree enabled protocol rpvst

Feel free to substitute the VLAN 1 for any other existing VLAN.

What switch is the root switch, by the way? Is it the VLT pair, or any other switch? Can you make sure on the Cat2960 switches using show spanning-tree root command that they both report the same - and correct - root switch for all VLANs?

Deactivating the EtherChannel Misconfig Guard can be performed on your Cat2960 switches using the following global config mode command:

no spanning-tree etherchannel guard misconfig

In any case, if we want to narrow down the problem, we need more information about the processes taking place in the moment of err-disabling the ports. Does your network operation guidelines allow you to run debugs on your switches? If they do, would you mind keeping the EtherChannel Misconfig Guard activated and having these debugs activated on your Cat2960?

debug spanning-tree etherchannel
debug spanning-tree exceptions
debug spanning-tree switch errors
!
debug spanning-tree events
debug spanning-tree general

The first three debugs should only produce output when error conditions ensue; the second two debugs produce very minimal output in the stable network and provide more context to the actual events that possible take place around the time of err-disabling the ports. These debugs should not negatively impair your switch operation - of course, as with all debugging, caution is always called for.

Best regards,
Peter

EDIT: Added the forgotten no keyword to the command for EtherChannel Misconfig Guard deactivation

Hi Peter,

below is the output for the commands you asked I run

Dell

DellDist1#show spanning-tree vlan 1

Neither PVST nor Rapid-PVST is enabled


DellDist1#show spanning-tree vlan 1681

Neither PVST nor Rapid-PVST is enabled

 

Cisco

C2960S#show spanning-tree root

                                        Root    Hello Max Fwd
Vlan                   Root ID          Cost    Time  Age Dly  Root Port
---------------- -------------------- --------- ----- --- ---  ------------
VLAN0001             0 90b1.1cf4.aa70       503    2   20  15  Po1
VLAN1016         33784 3cce.73ba.9280         3    2   20  15  Po1
VLAN1112         33880 0017.e05e.2880         3    2   20  15  Po1
VLAN1128         33896 0017.e05e.2880         3    2   20  15  Po1
VLAN1136         33904 0017.e05e.2880         3    2   20  15  Po1
VLAN1144         33912 0017.e05e.2880         3    2   20  15  Po1
VLAN1152         33920 0017.e05e.2880         3    2   20  15  Po1
VLAN1681         34449 0017.e05e.2880         3    2   20  15  Po1

----------------------

I have enabled debugs. Will the output been shown via "show log"?

Cheers once again,

Huw

 

Hi Huw,

It seems as if the Dell switches were not running the per-VLAN RSTP variant. That would mean that your Dell switches are most probably running plain RSTP without VLAN support. I now have a hypothesis that this may be at the core of the issues you are seeing. Let me elaborate.

Cisco switches use per-VLAN STP (or RSTP) variants, running a separate STP instance for each VLAN. To provide interoperability on trunks, STP instance for VLAN1 sends and receives normal IEEE-compatible BPDUs and is therefore able to seamlessly interoperate with all other switches out there, even if they do not speak per-VLAN STP. In addition, all per-VLAN STP instances send and receive their per-VLAN BPDUs in a slightly changed format, and most importantly, addressed to a different group MAC address than normal IEEE BPDUs. To switches that do not support the per-VLAN STP, these per-VLAN BPDUs are just multicast frames flooded throughout the network and not processed by them.

This flooding of per-VLAN BPDUs by non-per-VLAN-STP-aware switches is crucial to understanding what could be happening in your network. From the viewpoint of a Cat2960 switch, BPDUs for VLAN1 arriving through a Port-channel are sourced by the immediate neighboring Dell switch. However, per-VLAN BPDUs arriving through the same Port-channel are sourced by some other Catalyst switch in your network, possibly several "switch hops" away, not by the Dell - the Dell just floods them. Now, if there is a topology change somewhere in your network that affects some other Catalyst switch or switches, they may start sending BPDUs from their different ports until the network stabilizes into a new loop-free topology. Because these BPDUs - quite possibly sourced from different ports on Catalyst switches - are just flooded across the Dell switches, they will all arrive over the same Port-channel, and they trick the EtherChannel Misconfig Guard into thinking that the opposite device is not treating the ports as bundled.

The output of the show spanning-tree root command from the Cat2960 suggests that most probably, the Dell switches do not run per-VLAN RSTP - otherwise, they would become root switches in all VLANs. You can see that while VLAN1 has a Dell-based root switch (based on the OUI of 90:b1:1c in the MAC address), all other VLANs use Cisco switches as their respective roots - and even more, VLAN1016 has a different root switch than the other VLANs. All VLANs that have Cisco switches as their root switch in addition have their root switch priority set to 32768 (subtract the VLAN number from the priority shown in the output). Clearly, this is the default configuration of a Cisco switch. These differences indeed suggest that the Dell switches are not running per-VLAN RSTP.

I believe you should start the RSTP-PV (as Dell calls it) on all your Dell switches. According to the interoperability document you have referenced, the Dell switches should be configured with:

spanning-tree mode rapid-pvst

All your Dell switches should be configured with this command. I recommend to perform this change over a maintenance window, as there may be a transient connectivity outage. In addition, make sure that a reasonable switch becomes the root switch for all VLANs. Perhaps the VLT switches should be configured for this. The command should be fairly simple:

spanning-tree vlan 1-4094 priority 0

Just in case you want to proceed with the debugs as well, the debugs will indeed be visible in the show logging command on the Catalyst - but I recommend configuring the buffer for the logs to be bigger and to store all messages including the debugs:

logging buffered 100000 debugging

This command on Catalyst switches will make sure there is 100 Kbytes reserved for the log buffer, and all messages including the debugs will be stored there.

I suggest starting the RSTP-PV on your Dell switches as the first step and seeing if the issue stops occurring.

Best regards,
Peter

 

Thanks again for the excellent response. I think we are going to disable the misconfig guard as its the simplest solution.

Just to be clear is the command to disable the etherchannel misconfig guard:-

no spanning-tree etherchannel guard misconfig

From what I can see:-

spanning-tree etherchannel guard misconfig

Enables the guard.

Cheers,

-H

Hello Huw,

I think we are going to disable the misconfig guard as its the simplest solution.

It may be the simplest solution but in the long run, it is definitely not the best solution. You are basically choosing a workaround for a more profound problem of running different versions of STP in a single switched topology. I understand that deactivating the EtherChannel Misconfig Guard may be the most straightforward solution for the moment but please consider also aligning the STP versions used in your network.

Ideally, I would recommend running MST as that is the preferred STP version for multivendor environments. It requires a bit of pre-planning but in the long run, it proves beneficial.

The least disruptive STP streamlining, however, would be to simply activate the RSTP-PV on your Dell switches. While requiring a maintenance window, it should be very straightforward to implement.

Please note that currently, you do not really have your switched environment under control. Just looking at the output of show spanning-tree root, it shows very nicely that in VLANs other than VLAN1, the per-VLAN RSTP is running based on the default (unconfigured) values, ending up with random Catalyst switches being root switches. In addition, there is no way you can make your Dell switches to be root switches in these VLANs unless you run the RSTP-PV on them. While your network works right now, it is not the way it should be left running for the foreseeable future.

So I encourage you to deactivate the EtherChannel Misconfig Guard for the time being to avoid the outages caused by it, but I also encourage you even more to align the STP versions running on your switches so that only a single STP version is run across your network.

Anyway, yes, you're right - the correct command to deactivate the EtherChannel Misconfig Guard is no spanning-tree etherchannel guard misconfig. I forgot to put in the no keyword in my previous responses - I've corrected them as well. Thanks!

Best regards,
Peter

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco