cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3523
Views
0
Helpful
9
Replies

UDLD Aggressive

Daniel Smith
Level 1
Level 1

We have encountered issues where a 4500 chassis switch, connected to a pair of 6509s and running udld port aggressive on its interfaces, will get dropped off the network in the event of a crash and reload of the 4500. Apparently the interfaces on the 6500s see the loss of udld packets and put their interfaces in err-disabled state. This could likely happen with other access layer switches, but our experience has been with the 4500. Manual intervention is required to get the access switch back online with its upstream neighbors. I wonder if anyone knows of a tweak, nerd knob, or other work around to get past this issue?

9 Replies 9

glen.grant
VIP Alumni
VIP Alumni

  You can set it so that a er-disable port  tries to re enable the port after a given period of time.   You do need to determine why it goes err-disable , it should not being doing that. The 4500 should not be crashing either..

In order to turn on errdisable recovery and       choose the errdisable conditions, issue this command:

cat6knative#errdisable recovery cause ?  all                 Enable timer to recover from all causes
  arp-inspection      Enable timer to recover from arp inspection error disable
                      state
  bpduguard           Enable timer to recover from BPDU Guard error disable
                      state
  channel-misconfig   Enable timer to recover from channel misconfig disable
                      state
  dhcp-rate-limit     Enable timer to recover from dhcp-rate-limit error
                      disable state
  dtp-flap            Enable timer to recover from dtp-flap error disable state
  gbic-invalid        Enable timer to recover from invalid GBIC error disable
                      state
  l2ptguard           Enable timer to recover from l2protocol-tunnel error
                      disable state
  link-flap           Enable timer to recover from link-flap error disable
                      state
  mac-limit           Enable timer to recover from mac limit disable state
  pagp-flap           Enable timer to recover from pagp-flap error disable
                      state
  psecure-violation   Enable timer to recover from psecure violation disable
                      state
  security-violation  Enable timer to recover from 802.1x violation disable
                      state
  udld                Enable timer to recover from udld error disable state
  unicast-flood       Enable timer to recover from unicast flood disable state

This example shows how to enable the BPDU guard errdisable recovery       condition:

cat6knative(Config)#errdisable recovery cause bpduguard

A nice feature of this command is that, if you enable errdisable       recovery, the command lists general reasons that the ports have been put into       the error-disable state. In this example, notice that the BPDU guard feature       was the reason for the shutdown of port 2/4:

cat6knative#show errdisable recoveryErrDisable Reason    Timer Status
-----------------    --------------
udld                 Disabled
bpduguard            Enabled
security-violatio    Disabled
channel-misconfig    Disabled
pagp-flap            Disabled
dtp-flap             Disabled
link-flap            Disabled
l2ptguard            Disabled
psecure-violation    Disabled
gbic-invalid         Disabled
dhcp-rate-limit      Disabled
mac-limit            Disabled
unicast-flood        Disabled
arp-inspection       Disabled

Timer interval: 300 seconds

Interfaces that will be enabled at the next timeout:

Interface      Errdisable reason      Time left(sec)
---------    ---------------------    --------------
  Fa2/4                bpduguard          290

If any one of the errdisable recovery conditions is enabled, the ports       with this condition are reenabled after 300 seconds. You can also change this       default of 300 seconds if you issue this command:

cat6knative(Config)#errdisable recovery interval timer_interval_in_seconds

This example changes the errdisable recovery interval from 300 to 400       seconds:

cat6knative(Config)#errdisable recovery interval 400

Thanks for the post, we do use these recovery features already.

Leo Laohoo
Hall of Fame
Hall of Fame
will get dropped off the network in the event of a crash and reload of the 4500.

If the port goes into "err-disable" I want to see WHY.  Can you post the output to the command "sh interface status error"?

The command 'show interface status err-disabled' yields no information since none of the interfaces are currently in that statue. However, the log of the adjacent 6500 shows the following:

Apr 19 07:37:04.463: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet2/29, changed state to down

Apr 19 07:37:04.471: %LINK-3-UPDOWN: Interface GigabitEthernet2/29, changed state to down

Apr 19 07:37:04.459: %UDLD-SP-4-UDLD_PORT_DISABLED: UDLD disabled interface Gi2/29, aggressive mode failure detected

Apr 19 07:37:04.459: %PM-SP-4-ERR_DISABLE: udld error detected on Gi2/29, putting Gi2/29 in err-disable state

Which likely indicates a loss of UDLD packets, and the resulting determination that the link was now unidirectional.

Remove UDLD from both interfaces.


Is G2/29 a fiber port?

Yes, G2/29 is a fiber port.

Yes, G2/29 is a fiber port.

Ok, I think you've got a fibre issue.  Could be a fault with the fibre link itself, patch cables, modules.  Before you start looking for it, can you post the output to the command "sh interface " of both sides?  I want to see if I can "catch" any line errors.

My original post explained how both uplinks to an access layer switch went err-disabled following a crash/reload of a 4500. This is not a fiber issue as we have tracked identical outcomes from other crash/reload events on other devices. Controlled reloads do not result in the disabled event.

David Kosich
Level 1
Level 1

Daniel,

Other than an actual physical layer problem (which you said is not the case here), the most likely scenario for a "false positive" with UDLD is high CPU on either side of the link. That's one reason that UDLD aggressive is sometimes not recommedned, because it is more prone to false positives.

I would recommend getting the crash looked at by TAC (assuming this still happens, I know the post is a few months old). It sounds like whatever is happening during the 4500's crash, is affecting the 4500's ability to send the UDLD frames. UDLD frames are generated/received by the CPU, so if the 4500 is crashing due to a high CPU condition, it's possible the CPU is busy, and is not sending the UDLD frames (and therefore why the 6500 is err-disabling its ports).

Checking 'show proc cpu history' would show you what the CPU looked like. A crash info file may have that output (don't know off hand with the 4500). Either way, TAC can look at the tracebacks in the 4500's crash file to determine if it was most likely due to high CPU, in which case, the UDLD issue is just a symptom of a larger issue where traffic is being software switched by the CPU. The following doc has more information on High CPu on the 4500 if you are interested --

http://www.cisco.com/en/US/products/hw/switches/ps663/products_tech_note09186a00804cef15.shtml

David Kosich

Review Cisco Networking for a $25 gift card