Re: Catalyst 3650 Switch Stack misbehavior - No VLAN traffic on one switch

fcabianca · ‎08-24-2018

Last night we experienced something really unusual:

We had to perform some maintenance on one of our switch stacks.

It entailed powering down the top switch, which happened to be the master.

When it come back up, it rejoined the stack as member (the master role had been reassigned) and everything looked OK.

However, we soon realized that for one specific VLAN the switch was not passing any traffic, whereas the other switches in the stack had no issues with the same VLAN.

Eventually to solve the problem, we ended up reloading the entire stack.

Has anyone of you ever experienced a similar problem?

Any feedback will be appreciated.

Stack:

4 x 3650 running 16.3.6

Reza Sharifi · ‎08-24-2018

Is it possible that the config was not saved before shutting it down?

If it was, you may be hitting a bug in the OS you are running. You may want to open a case with TAC.

HTH

fcabianca · ‎08-24-2018

Reza,

The config had been saved.

Yes, we'll probably end up opening a case with TAC.

Thank you!

Diana Karolina Rojas · ‎08-24-2018

Hello fcabianca!

I did not see this specific behavior before, but the 3650 and 3850 stacks can take unstable state when you do changes in the stack and this afect the master, I strongly recommend you to read this document:

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/201070-Troubleshooting-3650-3850-reloads-by-sta.html

Maybe this is what happened to you:

"Recovering an Unstable Stack

There may be some situations where you see a several members in a stack reloading during boot up after the stack election process takes place. If a reloaded switch believes itself to be the stack master then this can often lead to a stack merge event and will enter into a boot loop state. In this situation, it may be advisable to ask the customer:

- Power down the entire stack and reseat all the stack cables firmly.

- Power-on each member switch in the stack one by one until all members have converged to its expected state.

- In cases where a member fails to join the stack, remove this from the stack and try booting this individual as a standalone to troubleshoot further."

Please do not forget to rate useful post.

Best Regards,

fcabianca · ‎08-24-2018

Hello Diana,

I greatly appreciate your input.

We'll definitely take a look at the link you provided.

Thank you!

pieterh · ‎08-27-2018

do you have information on the uptime of the other stackmembers?

I recently encountered something like this, where this behaviour happened even BEFORE reboot of the first stack-member (rebooting did not solve the issue).

looking further I discovered the other stack members had an uptime of more than 3 years!!

reboot of the complete stack did resolve the issue.

My guess: some resource/counter has reached a limit after this uptime.