I have completed the Auto install process a few times, but most have been with minimal production traffic in place. I have one scheduled for this weekend and would like clarification on whether it is best practice to shutdown the network and storage ports on the subordinate FI prior to starting the auto install. Some references/video's depict this but I don't see it listed in any of the Auto Install guidelines. If this is best practice do you first disable on the sub and once it is upgraded bring those interfaces up and verify data path is good, then disable primary prior to acknowledging the reboot?
Don't think this is best practise !
Before you start any upgrade, make sure your OS has multipathing (FC) and teaming (Ethernet) working.
Autoinstall infrastructure will upgrade subordinate first (after UCSM, IOM); and then waits for ACK from your side.
Before you ACK, check cluster status, IOM's, Northbound links from subordiante FI and OS multipathing (FC) and teaming (Ethernet).
Once this is ok, ACK !
Good luck, Walter.
Actually it now is. At Cisco Live this past week in a session about UCS we were told the new recommended best practice is to put the subordinate FI into Fabric Evacuation mode. This allows you to simulate the FI being offline before it goes offline. This way you can detect any server or uplink issues before you initiate the reboot. Fabric Evacuation can be enabled on the Fabric Interconnect object in the Equipment tab. Should something go down you can quickly turn evacuation off, whereas when you start the upgrade you have 15-20 minutes while the fabric interconnect is down that you can do nothing to restore service.
I think the clear statement is: Fabric Evacuation mode is not a simulation it does stop all the traffic that is active through a Fabric Interconnect.
I would rather first check if my OS multipathing / teaming is working, not realize after evacuation that something is wrong.
Don't forget: UCS is OS agnostic and
Fabric evacuation is supported only with the following:
That's awfully hard to coordinate in a large domain. We can't get everyone in the same place at the same time to validate OSes.
We tell os admins how to set their servers up. We tell them when we are doing the upgrade and when test failover will occur. That's how we are validating before firing the auto install.
Thanks for all the information...so from what I understand the Evacuation actually stops traffic, similar to shutting down the uplink ports. This is more "controlled" as it is a once click process and can be reverted back by switch it off. This would allow you to determine if there are any paths that are broke prior to the upgrade which could have the interconnect down for an extended period of time...
So the procedure is to Enable Fabric Evacuation on the subordinate interconnect, verify paths, start the Autoinstall...this is where I am somewhat confused...once that fabric is upgraded and backonline, you disable evacutation...all paths are restored...do make this the primary at that point and do the same process to the non upgraded prior to Acknowledging the Auto Install reboot of that fabric?
Again thanks for the info...
Here is our procedure:
1) identify fabric subordinate and document current port states (this is done automatically now in 3.1 I believe -- they have a new "baseline" feature that takes a configuration image at auto install time and then diffs it to show you what didn't come back)
2) inform system owners that redundancy will be lost on fabric X
3) enable fabric evacuation on subordinate
4) validate with system owners no issues; go after 15 minutes
5) activate firmware auto install; subordinate will reboot automatically after UCSM upgrade
6) once subordinate returns and UCSM is waiting for user ack to reboot primary fabric, disable fabric evacuation on subordinate.
7) wait for errors to stablize and all links to come up that were expected to (documented in step 1)
8) enable fabric evacuation for primary
9) validate with system owners no issues; wait 15 minutes
10) acknowledge fabric interconnect maintenance for primary; UCSM switches over and primary FI restarts
11) disable fabric evacuation on FI
12) wait for errors to stablize and all ports return to normal state
13) validate teaming/pathing is redundant again