01-04-2012 02:07 PM - edited 03-01-2019 10:12 AM
Environment:
One chassis, 6 blades, B200-M2
Two Fabric interconnects, 6120XP
Old firmware: 1.3.1n
New firmware: 1.4.3s
Problem:
When we activate UCS Manager from 1.3 to 1.4, we encountered all the 6 blades reboot suddenly and this does not match what the document said:
We did these steps:
1.) Update Adapter firmware
2.) Activate Adapter firmware one by one (downtime, but we can put esxi hosts in maintenance mode)
3.) Update CIMC firmware
4.) Activate CIMC firmware
5.) Update IO Module firmware
6.) Activate IO module firmware, but "Set Startup Version Only"
7.) Activate UCS Manager firmware <-- the problem occurs
We want to keep the VMs running during the firmware upgrade.
Downtime for one blade/ESXi host is acceptable, but downtime for all the blades are unacceptable for us.
Is there anyone know what causes the blade reboot when activating the UCS Manager firmware? Based on the release doc, only the session to GUI and CLI will be affected.
Thanks a lot and appreciate your help!
--Vincent
01-04-2012 02:55 PM
Unless I am missing something, you read the wrong doc. You should have read the one from 1.3 to 1.4 and your link says 1.4 to 1.4. Please clarify if what we are reading is correct.
Sent from Cisco Technical Support iPad App
01-04-2012 03:13 PM
Thanks Reginald
Do we have to upgrade firmware 1.3.1n to 1.4.1 and then 1.4.3?
We now directly upgrade firmware from 1.3.1n to 1.4.3s. Sorry if we miss the information in any document.
So we will try 1.3.1n -> 1.4.1m -> 1.4.3s
Is this correct?
Thanks
Vincent
01-04-2012 03:42 PM
Vincent,
You can upgrade directly from one release to another. Each one has steps to follow. See the link below for each release. You will want to use 1.3 to 1.4 and follow it step by step.
http://http://www.cisco.com/en/US/products/ps10281/prod_installation_guides_list.html
Hope this helps.
01-04-2012 05:23 PM
I now run into another issue, only one fabric interconnect get new version successfully and the other still run the old version and this results the cluster ip not pingable and the UCS manager not accessible.
Previous firmware: 1.4.3s
New firmware: 1.4.1m
After activate the UCS Manager, from the cli of fabric interconnect:
--------------------------------------
sdeucs-B# show cluster state
Cluster Id: 0xcfa2f2725b8811xxxxxxxxxx00059b790004
Incompatible versions:
local: 1.4(1m), peer: 1.4(3.0)
B: UP, ELECTION IN PROGRESS (Management services: UP)
A: UP, ELECTION IN PROGRESS (Management services: UNRESPONSIVE)
HA NOT READY
Management services are unresponsive on peer Fabric Interconnect
No device connected to this Fabric Interconnect
--------------------------------------
only one fabric interconnect downgrade the version successfully and the other not. This causes us lost the connectivitiy and also the management.
any hints here?
01-05-2012 08:20 AM
I would open a TAC case to get some visibility on this.
Sent from Cisco Technical Support iPad App
01-05-2012 09:10 AM
There is a full video guide for 1.3x to 1.4x upgrade at the below site.
Updating UCSM certainly should not cause any disruption other than having to restart your user session. Assuming your FI's were correctly clusted and HA was in an operational state prior to the UCSM upgrade.
Regards
Colin
01-05-2012 11:37 AM
Hi Reginald
I have opened a TAC case: # 620261865
Can you help?
Thanks
01-08-2012 09:55 AM
Yong,
Was this resolved with TAC?
Sent from Cisco Technical Support iPhone App
01-10-2012 03:03 PM
Almost Done.
The first reboot issue was caused by a bug:
http://cdetsweb-prd.cisco.com/apps/dumpcr?identifier=CSCtu17091&parentprogram=QDDTS
Brief info of this bug:
If we upgrade firmware 1.3.x to 1.4.3s directly, we will have blades unexpected reboot when we activate UCS manager.
The work around is to upgrade 1.3.x to 1.4.3r and then to 1.4.3s
The second wierd issue (only FI-B get activated and FI-A always stuck) was caused by corrupted mgmt db issue in FI-A (said by TAC engineer) and we have to rebuild FI-A and the cluter FI-B from scratch (erase all the configuration and init system) to fix the issue. Now it works fine.
By TAC engineer, they can't explain why the corruption happens and how we monitor it.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: