cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1499
Views
0
Helpful
18
Replies

Disk Problems - Can we recover? (UCS C240 M4SX)

voip7372
Level 4
Level 4

We have a UCS C240 M4SX that is out of coverage now (no contract) though the plan is to replace this with a new server, but for now we're stuck with it. The problem is, virtual disk 0 is down because we have 2 disks (1 and 2) that were in a predicted failure state (but still online) and someone swapped out both of those disks at the same time. As I understand it, we would have been OK if only one of the disks was replaced and then wait for the rebuild (can be several hours) and THEN replace the other disk that is predicted to fail. 

Both of the replaced disks showed 'foreign data'. My idea was to have someone remove ONE of the new disks and reinsert just ONE the original disks, hoping that if those old disks were really not failed yet, having 4 originals (there are 5 in the RAID/virtual disk) would bring up the virtul disk and allow the server to rebuild the one new disk that was in the server. The issue is, the original disk that was reinserted shows as 'unconfigured good' as the State (and 'Moderate fault' for the Health), so we reinserted the other original disk (now all originals are inserted), but the two original disks that were removed and then reinserted (after a few days) both show a state of 'unconfigured good' at this time. 

Is there any hope of recovering from this without losing the VMs we had on that virtual disk/datastore? If so, what could we try?

Correct me if I'm wrong, but my understanding was that with RAID 5, you can only have one disk failure. That's why I was hoping we could reinsert the old disks that were in the 'Online' State before removal ('predicted failure' for Status) and get the virtual disk back online so we could then let it rebuild the first new disk, then swap out the second original disk that was showing errors and let that second new disk get rebuilt also.

I think the biggest problem we had was that the person that swapped out one of the disks should have waited several hours for a rebuild to happen before replacing the second disk that was having errors. He didn't realize it may take hours for the rebuild to happen, so he swapped out the other disk and I think at that point, the virtual disk went offline and now we're in this situation...hoping there's a way to recover from it without rebuilding the VMs that were on this virtual disk/datastore.

This is the status BEFORE any changes were made and then the status after the two original disks were reinserted a few days later:

1-before-changes.jpg2-after-remove-reinsert.jpg3-virtual-disk-after-remove-reinsert.jpg

18 Replies 18

For step 2, you go to "Controller Info" and select "Create a Virtual Drive from Unused Disks". You may need to set the disks to unconfigured good after deleting the VD, but it should do it automatically. 

The VD will also automatically run background initialization, you do not need to do this manually. 

For Step 4, you may need to add Cd/DVD to the boot order. You press F6 and select the KVM mapped CD/DVD if you map the ISO via Virtual Media. I would just use virtual media, but if you use a flash drive, make sure that you burn the installer to the flash drive, you can just copy/paste the ISO. Lots of videos on Youtube on how to map CD/DVD via virtual media and install ESXi.

Yes to steps 5 and 6. 

Got it. Thanks again for all your help and quick replies!

And ah yes, I remember now/know what you mean about using the KVM in CIMC and virtual media because that's how I've updated VMware in the past (but I don't do this often, so it slipped my mind). 

I think we have what we need to go on now. 

Looks like I have one final question. Has anyone had success actually finding ESXi downloads on Broadcom's website??? Broadcom bought VMware and I've been poking around on their (horrific) website trying in vain to even find ESXi to download a copy. I have a login (I made sure of that because I had a VMware account but not it directs me to the Broadcom site when I try to find the downloads on the old VMware website. This is terrible.

EDIT: I think I found the 'directions' to get to the right place on their website, but it's still not behaving for me, so I'll keep trying (some menus aren't loading correctly).  https://knowledge.broadcom.com/external/article/366685/instructions-to-find-oem-custom-images-a.html

Edit #2:  This is very frustrating. Broadcom's website is horrible. I had to use Firefox to get the menus to load correctly to get to the correct area to download, THEN it forced me to update my information (it was incomplete) before it would let me download, BUT I only had the first letter of my last name added originally and the full company name they imported from VMware for my profile when my account was created and that name is more than 40 characters in length which apparently is over the limit for what Broadcom allows...and...guess what?...you can't edit this info. It's grayed out. I opened another tab and went to my profile to see if I could edit it there, but the company name isn't even in the profile (no field for it) and it won't let me update my last name. There's no address info there either. Unbelievable.  Fortunately, I found a copy of the ESXi installer on another old server we have, so I downloaded it from the server's datastore. *sigh*

voip7372_0-1715883119410.png

 

This is all finished and working   I'll add one important note in case anyone comes and finds this discussion later and has the same problem. After deleting and recreating Virtual Drive 0 in CIMC, I failed to go back to that virtual drive and mark it as the Boot Drive. So, after installing ESXi via the KVM on that recreated virtual drive and rebooting the server, it wouldn't boot up. It couldn't find anything to boot from. I panicked, but eventually figured out (help from a colleague) this is what needed to be done to fix it. Very simple task and overlooked in my case, but boy it can cause some panic :-).  

I made screenshots of the virtual switch and port groups in the networking section of VMware BEFORE taking it down to rebuild it (because remember, it was still up but with VMware apparently just running in memory after our disk failure and improper swap out)...so I was able to exactly replicate all the same setup in the rebuilt ESXi BEFORE importing/registering the existing VMs that were on the disks/datastore that was still working fine and so the VMs were just like before the rebuild. No changes needed and it was super easy/fast to get them back up.

Marking the boot drive in CIMC: Storage > Raid Controller > Virtual Drive Info

voip7372_0-1715985235073.png

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card