cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10207
Views
10
Helpful
8
Replies

3850-48XS virtual switch stack problems

kikkof
Level 1
Level 1

Hi,

we have 2 couples of 3850-48XS (C and K) stacked in 2 different datacenters experiencing dual active and reloads on both the stacks.

All the switches are running Denali 16.3.3 with the same license level.

Each couple is configured in a virtual stack using 2 FortyGigabit interfaces as stackwise virtual link+ 1 10GbE as dual active detection per switch.

The 2 couples are linked via 2 layer 2 links configured as simple trunks (no stack between the C and K).

Given the poor documentation I'm asking for any ideas, the stacks are in production and are causing some outages.

 

C couple

...

switch 1 provision ws-c3850-48xs
switch 2 provision ws-c3850-48xs
stackwise-virtual
 domain 1

...

interface FortyGigabitEthernet1/1/1
 stackwise-virtual link 1
 !
 interface FortyGigabitEthernet1/1/2
 stackwise-virtual link 1

...

interface FortyGigabitEthernet2/1/2
 stackwise-virtual link 1
 !
...

interface FortyGigabitEthernet2/1/4
 stackwise-virtual link 1

 

c#show stackwise-virtual neighbors
Stackwise Virtual Link(SVL) Neighbors Information:
--------------------------------------------------
Switch  SVL     Local Port                  Remote Port
------  ---     ----------                  -----------
1       1       FortyGigabitEthernet1/1/1   FortyGigabitEthernet2/1/2
                FortyGigabitEthernet1/1/2   FortyGigabitEthernet2/1/4
2       1       FortyGigabitEthernet2/1/2   FortyGigabitEthernet1/1/1
                FortyGigabitEthernet2/1/4   FortyGigabitEthernet1/1/2

 

c#show stackwise-virtual dual-active-detection
Dual-Active-Detection Configuration:
-------------------------------------
Switch  Dad port
------  ------------
1       TenGigabitEthernet1/0/44  
2       TenGigabitEthernet2/0/44 

 

K couple

 

switch 1 provision ws-c3850-48xs
switch 2 provision ws-c3850-48xs
stackwise-virtual
 domain 2

...

interface FortyGigabitEthernet1/1/1
 stackwise-virtual link 1
 !
...
interface FortyGigabitEthernet1/1/3
 stackwise-virtual link 1

...

interface FortyGigabitEthernet2/1/2
 stackwise-virtual link 1
 !
...
!
interface FortyGigabitEthernet2/1/4
 stackwise-virtual link 1

 

k#show stackwise-virtual neighbors
Stackwise Virtual Link(SVL) Neighbors Information:
--------------------------------------------------
Switch  SVL     Local Port                  Remote Port
------  ---     ----------                  -----------
1       1       FortyGigabitEthernet1/1/1   FortyGigabitEthernet2/1/4
                FortyGigabitEthernet1/1/3   FortyGigabitEthernet2/1/2
2       1       FortyGigabitEthernet2/1/2   FortyGigabitEthernet1/1/3
                FortyGigabitEthernet2/1/4   FortyGigabitEthernet1/1/1

 

k#show stackwise-virtual dual-active-detection
Dual-Active-Detection Configuration:
-------------------------------------
Switch  Dad port
------  ------------
1       TenGigabitEthernet1/0/44  
2       TenGigabitEthernet2/0/44  

We noticed two strange things:

1) the show platform command is inconsistent on the C stack

c#show platform
Switch  Ports    Model                Serial No.   MAC address     Hw Ver.       Sw Ver.
------  -----   ---------             -----------  --------------  -------       --------
 1       68     N/A                   N/A          <CORRECTMAC-C1>  N/A           N/A           
 2       68     WS-C3850-48XS-E       <CORRECTSERIAL-C2>  <CORRECTMAC-C2> V02           16.3.3        
Switch/Stack Mac Address : <CORRECTMAC-C2> - Local Mac Address
Mac persistency wait time: Indefinite
                                   Current
Switch#   Role        Priority      State
-------------------------------------------
*1       Active          1          Ready               
 2       Standby         1          Ready

 

k#show platform
Switch  Ports    Model                Serial No.   MAC address     Hw Ver.       Sw Ver.
------  -----   ---------             -----------  --------------  -------       --------
 1       68     WS-C3850-48XS-E       <CORRECTSERIAL-K1>  <CORRECTMAC-K1> V02           16.3.3        
 2       68     WS-C3850-48XS-E       <CORRECTSERIAL-K2>  <CORRECTMAC-K2> V02           16.3.3        
Switch/Stack Mac Address : <CORRECTSERIAL-K2> - Local Mac Address
Mac persistency wait time: Indefinite
                                   Current
Switch#   Role        Priority      State
-------------------------------------------
*1       Active          1          Ready               
 2       Standby         1          Ready

 

2) If we enable stackwise debugging (debug stackwise-virtual all) the result is hundreds of lines as the followings. It seems to look for a switch numbered 12 on the other couple switch 16 is mentioned).

The switch are all Cisco refurbished.

 

k: NVGEN of the Stackwise Enable/Disable
k: Dec 26 11:44:02.355: SWV:Configure SVL link Action
k: Dec 26 11:44:02.361: Valid Switch information not available for  switch 12
k: Dec 26 11:44:02.361: Stackwise Virtual DB switch and DSL get  returned false
k: Dec 26 11:44:02.368: SWV:Configure SVL link Action
k: Dec 26 11:44:02.373: Valid Switch information not available for  switch 12
k: Dec 26 11:44:02.373: Stackwise Virtual DB switch and DSL get  returned false
k: Dec 26 11:44:02.379: SWV:Configure SVL link Action
k: Dec 26 11:44:02.385: Valid Switch information not available for  switch 12
k: Dec 26 11:44:02.385: Stackwise Virtual DB switch and DSL get  returned false
k: Dec 26 11:44:02.399: SWV:Configure SVL link Action
k: Dec 26 11:44:02.406: Fo1/1/1 is on DSL 1 on Switch 1
k: Dec 26 11:44:02.407: SWV: Writing DSL information
k: Dec 26 11:44:02.409: SWV:Configure SVL link Action
k: Dec 26 11:44:02.415: Stackwise Virtual DB switch number not valid
k: Dec 26 11:44:02.422: SWV:Configure SVL link Action
k: Dec 26 11:44:02.429: Fo1/1/3 is on DSL 1 on Switch 1
k: Dec 26 11:44:02.429: SWV: Writing DSL information
k: Dec 26 11:44:02.431: SWV:Configure SVL link Action
k: Dec 26 11:44:02.438: Stackwise Virtual DB switch number not valid
k: Dec 26 11:44:02.445: SWV:Configure SVL link Action
k: Dec 26 11:44:02.451: Stackwise Virtual DB switch number not valid

 

The log during a problem:

 

c: Dec 26 02:45:27.199: %STACKMGR-6-SWITCH_REMOVED:Switch 2 R0/0: stack_mgr:  Switch 16 has been removed from the stack.
c: Dec 26 02:45:27.295: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_NOT_PRESENT)
c: Dec 26 02:45:27.295: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_DOWN)
c: Dec 26 02:45:27.295: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_REDUNDANCY_STATE_CHANGE)
c: Dec 26 02:45:27.501: %HMANRP-5-CHASSIS_DOWN_EVENT: Chassis 1 gone DOWN!
c: Dec 26 02:45:27.568: %RF-5-RF_RELOAD: Peer reload. Reason: EHSA standby down
c: Dec 26 02:46:56.020: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host <LOGHOSTIP> port 5140 started - reconnection
c: Dec 26 02:47:08.822: NTP Core (INFO): <NTPHOSTIP> 8014 84 reachable
c: Dec 26 02:47:08.822: NTP Core (INFO): <NTPHOSTIP> 962A 8A sys_peer
c: Dec 26 02:47:08.822: NTP Core (NOTICE): trans state : 5
c: Dec 26 02:47:08.822: NTP Core (NOTICE): Clock is synchronized.
c: Dec 26 02:51:24.819: NTP Core (INFO): <NTPHOSTINTERNALIP> 8014 84 reachable

 

Thank you for your attention,

Francesco

8 Replies 8

Austin Sabio
Level 4
Level 4

Examine the crashinfo and open cisco TAC case for official guidelines and resolutions since this is a production-impact. Use below REF to troubleshoot. 

REF

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/201070-Troubleshooting-3650-3850-reloads-by-sta.html

SUSPECTED_BUG

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuu88630

I hope this helps and good luck!

-Austin

Thank you Austin for replying.

 

I confirm the crashinfo logs contain a lot of interesting infos and will open a TAC as soon as possible.

 

Best,

Francesco

Waiting for Cisco to transfer the contracts from our reseller to us the stacks went lost a couple of times.

I've read of a couple of caveats fixed in Denali 1.3.5 (we are currently running 1.3.3).

CSCve59027 6.3.3 SV stack sometimes results in unpredictable switchover after a few days of operation

CSCvd67113 3850 stack startup config sometimes disappear after power cycle

Will wait until Cisco go (TAC) but we are planning a maintenance window to update to 1.3.5b.

In the meanwhile we attached console cables around to be able to reload switches when they leave the stack.

I've seen that in VSS is possible to avoid management ports to enter errdisable state when dual active is detected, but unfortunately it seems not possible using Stackwise Virtual...at least on Denali (am I right?).

I'll keep the post updated.

Francesco

ebenav11
Level 1
Level 1

Hi Francesco, good action plan in open the Service Request in the TAC.

I reviewed in the configuration guide in the CCO, in your configuration do you to configure the "stackwise-virtual dual-active-detection"?

Please you can to share the outputs of:

 

show stackwise-virtual switch number <1-2>
show stackwise-virtual link
show stackwise-virtual bandwidth
show stackwise-virtual neighbors
shows stackwise Virtual dual-active-detection

BR!

Thank you for replying,

yes we are using DAD, following the output of the suggested commands.

"Stackwise Virtual Configuration After Reboot:" depends on some not real configuration changes. If you issue show startup-config you won't see some of the stackwise commands (there is an open issue on losing commands from the startup conf, see previous messages), re-issuing those commands cause the switch to say that.

Ciao,

Francesco

 

c#show startup

facility-alarm critical exceed-action shutdown <==== According to Cisco this is no more effective
switch 1 provision ws-c3850-48xs
switch 2 provision ws-c3850-48xs
!     <====== Normally here you can find the stackwise-virtual command
!     <====== Normally here you can find the domain <domain-number> command
!
!
ip routing

...

 

c#show stackwise-virtual switch 1
Stackwise Virtual Configuration:
--------------------------------
Stackwise Virtual : Enabled
Domain Number : 1  

Switch  Stackwise Virtual Link  Ports
------  ----------------------  ------
1       1                       FortyGigabitEthernet1/1/1
                                FortyGigabitEthernet1/1/2

Stackwise Virtual Configuration After Reboot:
---------------------------------------------
Stackwise Virtual : Enabled
Domain Number : 1  

Switch  Stackwise Virtual Link  Ports
------  ----------------------  ------
1       1                       FortyGigabitEthernet1/1/1
                                FortyGigabitEthernet1/1/2

c#show stackwise-virtual switch 2
Stackwise Virtual Configuration:
--------------------------------
Stackwise Virtual : Enabled
Domain Number : 1  

Switch  Stackwise Virtual Link  Ports
------  ----------------------  ------
2       1                       FortyGigabitEthernet2/1/2
                                FortyGigabitEthernet2/1/4

Stackwise Virtual Configuration After Reboot:
---------------------------------------------
Stackwise Virtual : Enabled
Domain Number : 1  

 

c#show stackwise-virtual link
Stackwise Virtual Link(SVL) Information:
----------------------------------------
Flags:
------
Link Status
-----------
U-Up D-Down
Protocol Status
---------------
S-Suspended P-Pending E-Error T-Timeout R-Ready
-----------------------------------------------
Switch  SVL     Ports                           Link-Status     Protocol-Status
------  ---     -----                           -----------     ---------------
1       1       FortyGigabitEthernet1/1/1       U               R              
                FortyGigabitEthernet1/1/2       U               R              
2       1       FortyGigabitEthernet2/1/2       U               R              
                FortyGigabitEthernet2/1/4       U               R              

Switch  Stackwise Virtual Link  Ports
------  ----------------------  ------
2       1                       FortyGigabitEthernet2/1/2
                                FortyGigabitEthernet2/1/4

 

c#show stackwise-virtual bandwidth
Switch  Bandwidth
------  ---------
1       80  
2       80

 

c#show stackwise-virtual neighbors
Stackwise Virtual Link(SVL) Neighbors Information:
--------------------------------------------------
Switch  SVL     Local Port                  Remote Port
------  ---     ----------                  -----------
1       1       FortyGigabitEthernet1/1/1   FortyGigabitEthernet2/1/2
                FortyGigabitEthernet1/1/2   FortyGigabitEthernet2/1/4
2       1       FortyGigabitEthernet2/1/2   FortyGigabitEthernet1/1/1
                FortyGigabitEthernet2/1/4   FortyGigabitEthernet1/1/2

 

c#show stackwise-virtual dual-active-detection
Dual-Active-Detection Configuration:
-------------------------------------
Switch  Dad port
------  ------------
1       TenGigabitEthernet1/0/44  
2       TenGigabitEthernet2/0/44  

To keep the post updated: we finally migrate to 16.3.5b, but not without problems.

 

We updated one of the 2 stacks smoothly, but after a while (minutes) the other stack started to black hole traffic.

The only way to get traffic flow back to normal was switching off one of the 2 switches in the outdated stack. We had a stack on 16.3.5b up and running and a single switch running 16.3.3.

The 2 stacks are directly connected via a couple of layer 2 links, they can see each other via cdp, but there is no stacking relationship between them.

The day after we updated the second stack and everything went fine.

 

The update procedure is a bit scary but works very well and is described in the release notes.

In our case (16.3.3 to 16.3.5b) there are two matching paragraphs containing the same procedure:

 

Upgrading from Cisco IOS XE Denali 16.1.1 to 16.1.x, 16.2.x, or 16.3.x in Install Mode

 

Upgrading from Cisco IOS XE Denali 16.3.x to Cisco IOS XE 16.x in Install Mode

 

At the end my tech recommendation is obvious: go to the last stable version of IOS when dealing with these machines.

We are waiting for Cisco to transfer the support contract from the reseller to us, it is taking weeks, we still cannot open a TAC.

My commercial suggestion is obvious too: check from the first day your contracts status. Yes this is a just in time world, but moving a contract could be a nightmare.

 

Francesco

Hi,

 

May I ask how stable the virtual stack has been since the upgrade? We are faced with a challenge of either spanning a layer 2 connection across data centres and using HSRP between two network segments or exploring the option of Virtual Stackwise. Just looking to get some feedback post the upgrade.

 

Thanks

Hi All, 

 

Was this closed?

Review Cisco Networking for a $25 gift card