cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
640
Views
0
Helpful
1
Replies

UCS bare metal crash during FI planned reboot

subhakar77
Level 1
Level 1

We planning for vBlock 340 RCM upgrade from RCM 4.5.7, Firmware 2.1(3d)  to RCM 6.0.16, Firmware 2.2(8f) and as preliminary step, VCE recommended reboot of FIs one by one.

 

When first standby FI was rebooted, all bare metal blades booting from SAN hard rebooted. Different Oses - RHEL 5.9, 6.7 and Windows 2008 R2 most of them with PowerPath and some with native multipathing configured. all servers shows 4 paths properly in OS not sure why the paths did not failover to primary FI.

 

VCE is definitely NOT at all helpful in investigating why the servers crashed. Can some provide more documentation what could be the issue in this case. MDS Shows all 4 paths, proper zoning.

 

Vendors Microsoft, RHEL could not find anything as logs could not be written to disk during path down and OS crashed. 

 

is there anything that needs to be configured on UCSM / service profile for faster path failover. Anything to fine tune like increase timeout for disk lost during path recovery.  Any help would be much appreciated.

1 Reply 1

Wes Austin
Cisco Employee
Cisco Employee

If you are not using fabric failover, then your operating system is responsible for any kind of teaming, failover, link down scenario. If you are using fabric failover, the UCS will handle this.

 

You need to understand if your VFC interfaces went down (unpinned) or not. I would not suspect reboot a single FI would bring down the VFC on both A and B side fabric, unless they are not configured correctly. If the VFC stayed up and online, you need to investigate upstream.

 

If the VFC went offline for all of your UCS blades when you reboot a single FI, you have a problem.

 

I would open a TAC case and have them look at the VFC on problem hosts and confirm they stayed pinned and were flogi into the upstream switch when the issue occurred.

 

Based off the minimal information provided, I would say there was an issue with the storage handling multi-pathing/failover.

 

P.S......not sure why you were recommended to reboot your FI before the upgrade, however, its probably good you found this before you were in the middle of the upgrade.

Review Cisco Networking products for a $25 gift card