01-09-2013 02:15 PM - edited 03-07-2019 11:00 AM
We have encountered issues where a 4500 chassis switch, connected to a pair of 6509s and running udld port aggressive on its interfaces, will get dropped off the network in the event of a crash and reload of the 4500. Apparently the interfaces on the 6500s see the loss of udld packets and put their interfaces in err-disabled state. This could likely happen with other access layer switches, but our experience has been with the 4500. Manual intervention is required to get the access switch back online with its upstream neighbors. I wonder if anyone knows of a tweak, nerd knob, or other work around to get past this issue?
01-09-2013 02:23 PM
You can set it so that a er-disable port tries to re enable the port after a given period of time. You do need to determine why it goes err-disable , it should not being doing that. The 4500 should not be crashing either..
In order to turn on errdisable recovery and choose the errdisable conditions, issue this command:
cat6knative#errdisable recovery cause ? all Enable timer to recover from all causes arp-inspection Enable timer to recover from arp inspection error disable state bpduguard Enable timer to recover from BPDU Guard error disable state channel-misconfig Enable timer to recover from channel misconfig disable state dhcp-rate-limit Enable timer to recover from dhcp-rate-limit error disable state dtp-flap Enable timer to recover from dtp-flap error disable state gbic-invalid Enable timer to recover from invalid GBIC error disable state l2ptguard Enable timer to recover from l2protocol-tunnel error disable state link-flap Enable timer to recover from link-flap error disable state mac-limit Enable timer to recover from mac limit disable state pagp-flap Enable timer to recover from pagp-flap error disable state psecure-violation Enable timer to recover from psecure violation disable state security-violation Enable timer to recover from 802.1x violation disable state udld Enable timer to recover from udld error disable state unicast-flood Enable timer to recover from unicast flood disable state
This example shows how to enable the BPDU guard errdisable recovery condition:
cat6knative(Config)#errdisable recovery cause bpduguard
A nice feature of this command is that, if you enable errdisable recovery, the command lists general reasons that the ports have been put into the error-disable state. In this example, notice that the BPDU guard feature was the reason for the shutdown of port 2/4:
cat6knative#show errdisable recoveryErrDisable Reason Timer Status ----------------- -------------- udld Disabled bpduguard Enabled security-violatio Disabled channel-misconfig Disabled pagp-flap Disabled dtp-flap Disabled link-flap Disabled l2ptguard Disabled psecure-violation Disabled gbic-invalid Disabled dhcp-rate-limit Disabled mac-limit Disabled unicast-flood Disabled arp-inspection Disabled Timer interval: 300 seconds Interfaces that will be enabled at the next timeout: Interface Errdisable reason Time left(sec) --------- --------------------- -------------- Fa2/4 bpduguard 290If any one of the errdisable recovery conditions is enabled, the ports with this condition are reenabled after 300 seconds. You can also change this default of 300 seconds if you issue this command:
cat6knative(Config)#errdisable recovery interval timer_interval_in_secondsThis example changes the errdisable recovery interval from 300 to 400 seconds:
cat6knative(Config)#errdisable recovery interval 400
01-10-2013 04:53 AM
Thanks for the post, we do use these recovery features already.
01-09-2013 02:34 PM
will get dropped off the network in the event of a crash and reload of the 4500.
If the port goes into "err-disable" I want to see WHY. Can you post the output to the command "sh interface status error"?
01-10-2013 04:57 AM
The command 'show interface status err-disabled' yields no information since none of the interfaces are currently in that statue. However, the log of the adjacent 6500 shows the following:
Apr 19 07:37:04.463: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/29, changed state to down
Apr 19 07:37:04.471: %LINK-3-UPDOWN: Interface GigabitEthernet2/29, changed state to down
Apr 19 07:37:04.459: %UDLD-SP-4-UDLD_PORT_DISABLED: UDLD disabled interface Gi2/29, aggressive mode failure detected
Apr 19 07:37:04.459: %PM-SP-4-ERR_DISABLE: udld error detected on Gi2/29, putting Gi2/29 in err-disable state
Which likely indicates a loss of UDLD packets, and the resulting determination that the link was now unidirectional.
01-10-2013 01:57 PM
Remove UDLD from both interfaces.
Is G2/29 a fiber port?
01-11-2013 05:47 AM
Yes, G2/29 is a fiber port.
01-11-2013 02:07 PM
Yes, G2/29 is a fiber port.
Ok, I think you've got a fibre issue. Could be a fault with the fibre link itself, patch cables, modules. Before you start looking for it, can you post the output to the command "sh interface
01-13-2013 03:57 AM
My original post explained how both uplinks to an access layer switch went err-disabled following a crash/reload of a 4500. This is not a fiber issue as we have tracked identical outcomes from other crash/reload events on other devices. Controlled reloads do not result in the disabled event.
04-05-2013 06:09 AM
Daniel,
Other than an actual physical layer problem (which you said is not the case here), the most likely scenario for a "false positive" with UDLD is high CPU on either side of the link. That's one reason that UDLD aggressive is sometimes not recommedned, because it is more prone to false positives.
I would recommend getting the crash looked at by TAC (assuming this still happens, I know the post is a few months old). It sounds like whatever is happening during the 4500's crash, is affecting the 4500's ability to send the UDLD frames. UDLD frames are generated/received by the CPU, so if the 4500 is crashing due to a high CPU condition, it's possible the CPU is busy, and is not sending the UDLD frames (and therefore why the 6500 is err-disabling its ports).
Checking 'show proc cpu history' would show you what the CPU looked like. A crash info file may have that output (don't know off hand with the 4500). Either way, TAC can look at the tracebacks in the 4500's crash file to determine if it was most likely due to high CPU, in which case, the UDLD issue is just a symptom of a larger issue where traffic is being software switched by the CPU. The following doc has more information on High CPu on the 4500 if you are interested --
http://www.cisco.com/en/US/products/hw/switches/ps663/products_tech_note09186a00804cef15.shtml
David Kosich
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide