Y'all are welcome to safely

michael.yurchenko · ‎11-28-2016

Suppose I have 2 3850 switches in a stack. Each access switch would be connected to both stack switches through etherchannel. Outside firewall and router are likewise connected to both stack switches through etherchannel. So if one switch in the stack is down, as long as the second switch is still passing traffic, I'm up and running and my users are happy.

If one switch dies, the second takes over, I can bring a replacement hardware and rebuild the stack. No downtime.

Suppose I want to upgrade the ios on them without downtime. What could I do?

1) break the stack, upgrade the master (while that's going on, the second switch is the master of his own domain), rebuild the stack (the second switch will be autoupgraded but the first switch should stay up)

2) load the firmware onto the second switch, change "boot system" setting, make the second switch master, reload second switch (while that is happening the first switch is up) - when the second switch comes back up would it become the master and keep the traffic flowing while the first gets the new software?

3) any other options?

To be clear, as long as at least one switch is up and running, our network is happy.

Reza Sharifi · ‎11-28-2016

Upgrading IOS will require you to reboot the stack and as far as I know, there is no way to do this without a short downtime to reboot. The instruction you provided is lengthy, dangers and not recommended.

In addition, having a stack of switches with 2 different IOS is not a very good practice and can cause your network to have more down time than you anticipated. The simplest way is to load the IOS to both switches (using the USB port), verify the new IOS is loaded with no issues, change the boot variable, plan a 15 minutes maintenance window for after hours and reload the stack.

HTH

Reuben Farrelly · ‎11-28-2016

What you may be able to get away with is:

- Load the image onto the stack (make sure all members get the image)

- Reload slot 2

- Wait 2 or 3 minutes for the switch to almost but not quite come back

- Reload slot 1 before slot 2 comes back up and before the 'mismatch' becomes a problem

I haven't tried it however that may reduce your downtime.

michael.yurchenko · ‎11-29-2016

Y'all are welcome to safely assume that I've searched and googled about this.

I'm looking for something along the lines of practical advice that would be based on equipment testing.

kurdtkuei · ‎09-25-2018

Hi Michael,

have you found any solution to that?

It can't be that there is a Stack configuration that's fully redundant and the only situation where you don't have a redundancy is when you're doing an upgrade of the switch.

We have similar situation, fully redundable stack and I can't do a 15 min. maintenance window, as on the stack are hanging many ESX hosts with hundreds Virtual Machines which access SAN disks - so I would have to shut down all the machines and that's a task for hours not minutes.

Warm upgrading of the 3850 stack?