08-24-2018 05:51 AM - edited 03-08-2019 03:58 PM
Last night we experienced something really unusual:
We had to perform some maintenance on one of our switch stacks.
It entailed powering down the top switch, which happened to be the master.
When it come back up, it rejoined the stack as member (the master role had been reassigned) and everything looked OK.
However, we soon realized that for one specific VLAN the switch was not passing any traffic, whereas the other switches in the stack had no issues with the same VLAN.
Eventually to solve the problem, we ended up reloading the entire stack.
Has anyone of you ever experienced a similar problem?
Any feedback will be appreciated.
Stack:
4 x 3650 running 16.3.6
08-24-2018 07:05 AM
Is it possible that the config was not saved before shutting it down?
If it was, you may be hitting a bug in the OS you are running. You may want to open a case with TAC.
HTH
08-24-2018 11:59 AM
Reza,
The config had been saved.
Yes, we'll probably end up opening a case with TAC.
Thank you!
08-24-2018 07:30 AM - edited 08-24-2018 07:31 AM
Hello fcabianca!
I did not see this specific behavior before, but the 3650 and 3850 stacks can take unstable state when you do changes in the stack and this afect the master, I strongly recommend you to read this document:
Maybe this is what happened to you:
"Recovering an Unstable Stack
There may be some situations where you see a several members in a stack reloading during boot up after the stack election process takes place. If a reloaded switch believes itself to be the stack master then this can often lead to a stack merge event and will enter into a boot loop state. In this situation, it may be advisable to ask the customer:
- Power down the entire stack and reseat all the stack cables firmly.
- Power-on each member switch in the stack one by one until all members have converged to its expected state.
- In cases where a member fails to join the stack, remove this from the stack and try booting this individual as a standalone to troubleshoot further."
Please do not forget to rate useful post.
Best Regards,
08-24-2018 11:55 AM
Hello Diana,
I greatly appreciate your input.
We'll definitely take a look at the link you provided.
Thank you!
08-27-2018 12:25 AM
do you have information on the uptime of the other stackmembers?
I recently encountered something like this, where this behaviour happened even BEFORE reboot of the first stack-member (rebooting did not solve the issue).
looking further I discovered the other stack members had an uptime of more than 3 years!!
reboot of the complete stack did resolve the issue.
My guess: some resource/counter has reached a limit after this uptime.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide