cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2332
Views
0
Helpful
8
Replies

VSS issues

carl_townshend
Spotlight
Spotlight

Hi All

we are having some issues with our 4500X switches running VSS, the version is Version 03.06.06.E 

basically the VSS just stops working for a few mins then comes back up, there is no power loss on both switches or loss of the network connectivity, they are both directly connected, the logs are below

 

001851: Oct 30 16:37:26.511: %EC-5-UNBUNDLE: Interface TenGigabitEthernet1/1/16 left the port-channel Port-channel1
001852: Oct 30 16:37:26.523: %VSLP-3-VSLP_LMP_FAIL_REASON: Te1/1/16: Link down
001853: Oct 30 16:37:26.523: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/16 left the port-channel Port-channel2
001854: Oct 30 16:37:26.538: %EC-5-UNBUNDLE: Interface TenGigabitEthernet1/1/15 left the port-channel Port-channel1
001855: Oct 30 16:37:26.550: %VSLP-3-VSLP_LMP_FAIL_REASON: Te1/1/15: Link down
001856: Oct 30 16:37:26.550: %VSLP-2-VSL_DOWN:   All VSL links went down while switch is in ACTIVE role
001857: Oct 30 16:37:26.550: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/15 left the port-channel Port-channel2
001858: Oct 30 16:37:26.640: % Service-policy attached to VSL member ports cannot be modified or removed except defaulting VSL members.
001859: Oct 30 16:37:26.661: % Service-policy attached to VSL member ports cannot be modified or removed except defaulting VSL members.
001860: Oct 30 16:37:26.828: % Service-policy attached to VSL member ports cannot be modified or removed except defaulting VSL members.
001861: Oct 30 16:37:26.836: % Service-policy attached to VSL member ports cannot be modified or removed except defaulting VSL members.
001862: Oct 30 16:37:27.123: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been lost
001863: Oct 30 16:37:27.130: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/1 left the port-channel Port-channel3
001864: Oct 30 16:37:27.132: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/2 left the port-channel Port-channel6
001865: Oct 30 16:37:27.133: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/3 left the port-channel Port-channel7
001866: Oct 30 16:37:27.139: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/6 left the port-channel Port-channel5
001867: Oct 30 16:37:27.140: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/7 left the port-channel Port-channel8
001868: Oct 30 16:37:27.142: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/8 left the port-channel Port-channel4
001869: Oct 30 16:37:27.147: %EC-5-UNBUNDLE: Interface TenGigabitEthernet2/1/9 left the port-channel Port-channel5
001870: Oct 30 16:37:27.193: % Service-policy attached to VSL member ports cannot be modified or removed except defaulting VSL members.
001871: Oct 30 16:37:27.201: % Service-policy attached to VSL member ports cannot be modified or removed except defaulting VSL members.
001872: Oct 30 16:37:27.206: %C4K_REDUNDANCY-3-SIMPLEX_MODE: The peer Supervisor has been lost
001873: Oct 30 16:38:06.840: %OSPF-5-ADJCHG: Process 100, Nbr 172.31.129.145 on Vlan4009 from FULL to DOWN, Neighbor Down: Dead timer expired
001874: Oct 30 16:38:16.440: %UDLD-4-UDLD_PORT_DISABLED: UDLD disabled interface Te1/1/8, aggressive mode failure detected
001875: Oct 30 16:38:16.440: %PM-4-ERR_DISABLE: udld error detected on Te1/1/8, putting Te1/1/8 in err-disable state
001876: Oct 30 16:38:16.444: %EC-5-UNBUNDLE: Interface TenGigabitEthernet1/1/8 left the port-channel Port-channel4
001877: Oct 30 16:39:09.221: %OSPF-5-ADJCHG: Process 100, Nbr 172.31.129.145 on Vlan4009 from LOADING to FULL, Loading Done
001878: Oct 30 16:42:15.605: %C4K_IOSINTF-5-LMPHWSESSIONSTATE: Lmp HW session UP on slot 1 port 15.
001879: Oct 30 16:42:17.845: %C4K_IOSINTF-5-LMPHWSESSIONSTATE: Lmp HW session UP on slot 1 port 16.
001880: Oct 30 16:42:31.612: %VSLP-5-VSL_UP:  Ready for control traffic
001881: Oct 30 16:42:36.617: %VSLP-5-RRP_ROLE_RESOLVED: Role resolved as ACTIVE  by VSLP
001882: Oct 30 16:42:36.617: %EC-5-BUNDLE: Interface TenGigabitEthernet1/1/15 joined port-channel Port-channel1
001883: Oct 30 16:42:36.629: %EC-5-BUNDLE: Interface TenGigabitEthernet1/1/16 joined port-channel Port-channel1
001884: Oct 30 16:42:37.144: %C4K_REDUNDANCY-6-DUPLEX_MODE: The peer Supervisor has been detected
001885: Oct 30 16:43:15.977: %C4K_REDUNDANCY-6-MODE: ACTIVE supervisor initializing for sso mode
001886: Oct 30 16:43:16.150: %C4K_REDUNDANCY-3-COMMUNICATION: Communication with the peer Supervisor has been established
001887: Oct 30 16:43:16.433: %PM-4-ERR_RECOVER: Attempting to recover from udld err-disable state on Te1/1/8
001888: Oct 30 16:43:16.821: %EC-5-BUNDLE: Interface TenGigabitEthernet1/1/8 joined port-channel Port-channel4
001889: Oct 30 16:43:26.756: %C4K_REDUNDANCY-5-CONFIGSYNC: The bootvar has been successfully synchronized to the standby supervisor
001890: Oct 30 16:43:26.757: %C4K_REDUNDANCY-5-CONFIGSYNC: The config-reg has been successfully synchronized to the standby supervisor
001891: Oct 30 16:43:26.759: %C4K_REDUNDANCY-5-CONFIGSYNC: The startup-config has been successfully synchronized to the standby supervisor
001892: Oct 30 16:43:27.244: %C4K_REDUNDANCY-5-CONFIGSYNC: The private-config has been successfully synchronized to the standby supervisor
001893: Oct 30 16:43:28.036: %C4K_REDUNDANCY-5-CONFIGSYNC_RATELIMIT: The vlan database has been successfully synchronized to the standby supervisor
001894: Oct 30 16:44:11.822: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/15 joined port-channel Port-channel2
001895: Oct 30 16:44:11.843: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/16 joined port-channel Port-channel2
001896: Oct 30 16:44:15.049: %HA_CONFIG_SYNC-6-BULK_CFGSYNC_SUCCEED: Bulk Sync succeeded
001897: Oct 30 16:44:15.821: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/1 joined port-channel Port-channel3
001898: Oct 30 16:44:15.826: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/2 joined port-channel Port-channel6
001899: Oct 30 16:44:15.836: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/3 joined port-channel Port-channel7
001900: Oct 30 16:44:15.856: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/6 joined port-channel Port-channel5
001901: 000027: Oct 30 16:44:15.013: %C4K_IOSMODPORTMAN-6-MODULEONLINE: STANDBY:Module 1 (WS-C4500X-16 S/N: JAE21120B7K Hw: 1.1) is online
001902: 000028: Oct 30 16:44:15.013: %C4K_IOSMODPORTMAN-6-MODULEONLINE: STANDBY:Module 11 (WS-C4500X-16 S/N: JAE21120ARC Hw: 1.1) is online
001903: Oct 30 16:44:15.906: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/8 joined port-channel Port-channel4
001904: Oct 30 16:44:15.911: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/9 joined port-channel Port-channel5
001905: Oct 30 16:44:16.079: %RF-5-RF_TERMINAL_STATE: 1 ha_mgr:  Terminal state reached for (SSO)
001906: 000029: Oct 30 16:44:15.835: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/1 joined port-channel Port-channel3
001907: 000030: Oct 30 16:44:15.859: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/2 joined port-channel Port-channel6
001908: 000031: Oct 30 16:44:15.865: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/3 joined port-channel Port-channel7
001909: 000032: Oct 30 16:44:15.949: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/6 joined port-channel Port-channel5
001910: 000033: Oct 30 16:44:15.964: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/8 joined port-channel Port-channel4
001911: 000034: Oct 30 16:44:16.071: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/9 joined port-channel Port-channel5
001912: Oct 30 16:44:17.838: %EC-5-BUNDLE: Interface TenGigabitEthernet2/1/7 joined port-channel Port-channel8
001913: 000035: Oct 30 16:44:17.871: %EC-5-BUNDLE: STANDBY:Interface TenGigabitEthernet2/1/7 joined port-channel Port-channel8

 

8 Replies 8

Leo Laohoo
Hall of Fame
Hall of Fame

@carl_townshend wrote:

001887: Oct 30 16:43:16.433: %PM-4-ERR_RECOVER: Attempting to recover from udld err-disable state on Te1/1/8


Seriously?  Auto-recovery when UDLD kicks in is enabled?  

 

Carl, 

Post the complete output to the command "sh redundancy".

Hi

What is the issue with having UDLD enabled ?

please see below show redundancy

sh redundancy
Redundant System Information :
------------------------------
       Available system uptime = 1 year, 19 weeks, 2 days, 13 hours, 33 minutes
Switchovers system experienced = 0
              Standby failures = 12
        Last switchover reason = none
                 Hardware Mode = Duplex
    Configured Redundancy Mode = Stateful Switchover
     Operating Redundancy Mode = Stateful Switchover
              Maintenance Mode = Disabled
                Communications = Up
Current Processor Information :
------------------------------
               Active Location = slot 1/1
        Current Software state = ACTIVE
       Uptime in current state = 1 year, 19 weeks, 2 days, 13 hours, 31 minutes
                 Image Version = Cisco IOS Software, IOS-XE Software, Catalyst 4                                                                                                                                                             500 L3 Switch  Software (cat4500e-UNIVERSALK9-M), Version 03.06.06.E RELEASE SOF                                                                                                                                                             TWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Fri 16-Dec-16 21:17 by prod
               BOOT = bootflash:cat4500e-universalk9.SPA.03.06.06.E.152-2.E6.bin,12;
        Configuration register = 0x2101
Peer Processor Information :
------------------------------
              Standby Location = slot 2/1
        Current Software state = STANDBY HOT
       Uptime in current state = 20 hours, 1 minute
                 Image Version = Cisco IOS Software, IOS-XE Software, Catalyst 4500 L3 Switch  Software (cat4500e-UNIVERSALK9-M), Version 03.06.06.E RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Fri 16-Dec-16 21:17 by pr
               BOOT = bootflash:cat4500e-universalk9.SPA.03.06.06.E.152-2.E6.bin,12;
        Configuration register = 0x2101

you are running UDLD error Dsiable, check your fiber connection , or disable UDLD check

Why would you disable UDLD, it is there for a reason, to protect against a single fiber going down.

UDLD is doing its job if it sees a link go down

Carl,
UDLD is a life saver. Think about it: Once it detects a potential issue with the link it will bring it down. Why? Because if this link happens to be carrying very large routing tables, you don't want this link to be flapping or it'll kill the CPU.
Look at the output of the "sh redundancy". Notice that the 2nd card has an uptime of 20 hours? I bet I know what caused that.

If UDLD kicks in, tough luck.  Take the time to investigate WHY UDLD got triggered.  Enabling the auto-recovery is just sweeping the problem under the rug.
Common sense must be used: If UDLD is enabled, disable the auto-recovery. If auto-recovery is enabled then disable UDLD.

NOTE:  Please note the config-registry value is 0x2101.

Hi Leo

When you say, you bet you know what caused it? what are your thoughts ?

 


@carl_townshend wrote:

When you say, you bet you know what caused it? what are your thoughts? 


Carl, 

The secondary line card has a very low uptime.  This means either the line card lost power (unlikely) but I'm leaning towards crashing. 

UDLD is enabled.  And someone then enabled link auto-recovery due to UDLD.  This is what usually happens: 

Let's presume you've got a VSS pair and they are doing BGP routing. 

1.  UDLD detection gets detected & one of the link goes into error-disabled;

NOTE:  Let's just say that that link is the PRIMARY path to the internet. 

2.  This means everything stops; 

3.  30 seconds later link becomes auto-recovery; 

4.  Guess what:  BGP routing and advertisements comes flooding in; 

5.  When this happens the first thing that gets hit is the CPU; 

6.  After a few minutes, repeat #1 to #5 for about four to five "cycles".

Now what do you think is going to happen to the supervisor card?  

Another thing:  If this has been happening for awhile, look inside the crashinfo or coredump folder.

Prove me wrong, Carl:  Disable UDLD auto-recovery and then investigate which link(s) go into UDLD error-disable.  Try it for four days.  

The UDLD protocol allows devices connected through fiber-optic or copper Ethernet cables (for example, Category 5 cabling) to monitor the physical configuration of the cables and detect when a unidirectional link exists. When a unidirectional link is detected, UDLD shuts down the affected port and alerts the user.

A unidirectional link occurs whenever traffic transmitted by the local device over a link is received by the neighbor but traffic transmitted from the neighbor is not received by the local device. For example, if one of the fiber strands in a pair is disconnected, as long as autonegotiation is active the link does not stay up. In this situation, the logical link is undetermined, and UDLD does not take any actions. If both fibers are working normally at Layer 1, then UDLD at Layer 2 determines whether those fibers are connected correctly and whether traffic is flowing bidirectionally between the correct neighbors.

I posts the UDLD explanation because knowing that the best way to isolate is move the same cable to another port with UDLD enabled, if the cable is the cause the UDLD will follow. In short, of running diagnostics on the line card by re-seating it, there is not command to see the health of the port. diagnostic is run at boot-up not online.

Please rate if it helps.
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card