cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
17320
Views
0
Helpful
17
Replies

catalyst 3850 stack members lost link

danilov.do
Level 1
Level 1

Hi!

Today I faced with unexpected stack link lost between members of stack. Now stack in consistent stHow troubleshoot such problem?

Also I found strange why switch #1 restarted (as log shows, see futher) while link lost between #2 and #3?

 

 

c3850 stack configuration:

 

Switch# Role Mac Address Priority Version State
------------------------------------------------------------
1         Member  00bf.77c2.ae00  15 V02 Ready
*2       Active      00bf.77c3.1e80  1  V02 Ready
3        Standby   ec1d.8bd5.0c80 1  V08 Ready

 

Stack Port Status Neighbors
Switch# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
1             OK       OK    2        3
2             OK       OK    3        1
3             OK       OK    1        2

 

Log events:

 

Sep 22 18:08:52.761: %STACKMGR-1-STACK_LINK_CHANGE:Switch 2 R0/0: stack_mgr: Stack port 2 on switch 2 is down
Sep 22 18:08:52.760: %STACKMGR-1-STACK_LINK_CHANGE:Switch 3 R0/0: stack_mgr: Stack port 1 on switch 3 is down
Sep 22 18:08:53.711: %SPANTREE-5-ROOTCHANGE: Root Changed for vlan 1: New Root Port is Port-channel24. New Root Mac Address is 0019.3056.6c80
Sep 22 18:08:53.872: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-channel1, changed state to down
Sep 22 18:08:53.950: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 1 gone DOWN!
Sep 22 18:13:58.495: %STACKMGR-1-STACK_LINK_CHANGE:Switch 2 R0/0: stack_mgr: Stack port 2 on switch 2 is up
Sep 22 18:13:58.837: %STACKMGR-1-STACK_LINK_CHANGE:Switch 3 R0/0: stack_mgr: Stack port 1 on switch 3 is up
Sep 22 18:13:58.855: %STACKMGR-6-SWITCH_ADDED:Switch 2 R0/0: stack_mgr: Switch 1 has been added to the stack.
Sep 22 18:13:58.852: %STACKMGR-6-SWITCH_ADDED:Switch 3 R0/0: stack_mgr: Switch 1 has been added to the stack.
Sep 22 18:14:04.751: %PLATFORM_STACKPOWER-6-CABLE_EVENT: Switch 1 stack power cable 1 removed
Sep 22 18:14:05.907: %HMANRP-6-HMAN_IOS_CHANNEL_INFO: HMAN-IOS channel event for switch 1: EMP_RELAY: Channel UP!
Sep 22 18:14:05.991: %HMANRP-6-EMP_NO_ELECTION_INFO: Could not elect active EMP switch, setting emp active switch to 0: EMP_RELAY: Could not elect switch with mgmt port UP
Sep 22 18:15:18.131: %SIF_MGR-1-FAULTY_CABLE:Switch 1 R0/0: sif_mgr: High hardware interrupt seen on switch 1

 

 

 

1 Accepted Solution

Accepted Solutions

Raise a TAC Case.

View solution in original post

17 Replies 17

Leo Laohoo
Hall of Fame
Hall of Fame
Post the complete output to the following commands:
1. sh version; and
2. dir crashinfo-1:

Hello Leo! Here is system info from device. Best regards, Dmitry.

system-report_1_20190923-011010-Kras.tar.gz

Switch 1 crashed. 

Can you attach this file so we can have a look?

Hello!
System report in attach.
--
Dmitry.

I'm unable to download the file.

Hello!
Hmm ...
Let's try again
--
Dmitry.

Still not. Tried using two browsers.

Hello
Try this link:
https://yadi.sk/d/ERFeERvNF2YuQA
valid 1 day
--
Dmitry.

Thanks, I got it. 

In that file, there is another file called "sw-core_1_RP_0_fed_17777_20190923-010852-Kras.core".  Unfortunately, I don't have the necessary tools to read the contents of this file but I suspect the cause of the crash to be CSCvb91970. 

This switch member has crashed TWICE in the same month.  

I would recommend upgrading the stack to 16.3.9.  

Hope this helps.

Hello, Leo!
Thank you for your assistant! I'll follow your advice. Strange, but this bug should not appear in 16.3.7 .
Also I found error messages repeatedly appeared in system reports, like that:
[87556.664000] EDAC DEVICE0: CacheErr (Dcache):ffffffffffff5701, core 1/cpu 1, cp0_errorepc == ffffffff811d543c
[87556.665000] EDAC DEVICE0: CE: cache instance: cpu1 block: cache0 'dcache'
[111737.750000] EDAC DEVICE0: CacheErr (Dcache):ffffffffffff7b01, core 1/cpu 1, cp0_errorepc == ffffffff811d9188
[111737.852000] EDAC DEVICE0: CE: cache instance: cpu1 block: cache0 'dcache'
[233861.378000] EDAC DEVICE0: CacheErr (Dcache):ffffffffffff3301, core 1/cpu 1, cp0_errorepc == ffe649f000
[233861.474000] EDAC DEVICE0: CE: cache instance: cpu1 block: cache0 'dcache'

--
Best regards, Dmitry.

Hello!
Stack crashed 7 hours after software upgrade from 16.3.7 to 16.3.9
I thinik it's a hardware problem. I'm based on messages from log:
Oct 05 17:17:55 sw-core_1_RP_0 kernel: EDAC DEVICE0: CacheErr (Dcache):ffffffffffff0f01, core 1/cpu 1, cp0_errorepc == ffea478d68
Oct 05 17:17:55 sw-core_1_RP_0 kernel: EDAC DEVICE0: CE: cache instance: cpu1 block: cache0 'dcache'
Oct 05 17:19:00 sw-core_1_RP_0 kernel: device_release(80000001044a8c78,80000000e9eb1100)
Oct 05 17:19:00 sw-core_1_RP_0 kernel: LSMPI: Deregister dual stack diverter

--
Dmitry.


@danilov.do wrote:
Oct 05 17:19:00 sw-core_1_RP_0 kernel: device_release(80000001044a8c78,80000000e9eb1100)
Oct 05 17:19:00 sw-core_1_RP_0 kernel: LSMPI: Deregister dual stack diverter

Dmitry, 

Is Dot1x enabled?  

The above messages is very descriptive of CSCvp58583.

Hello, Leo!
Dot1x not enabled.
--
Dmitry.

Raise a TAC Case.
Review Cisco Networking for a $25 gift card