Solved: C240 M4, LSI MegaRAID SAS 3108 not rebuilding raid from policy on disk replacement?

dietsoda · ‎02-23-2018

I have a storage profile policy pushing down from UCS Cental to a C240 M4. The storage policy dictates each of the 12 SAS disks controlled by the LSI MegaRAID SAS 3108 is a virtual RAID0 for a highly available application that just needs striping.

Recently, a disk went bad, i opened a TAC case, sent them a support bundle on server where the disk was showing inoperable, they sent out a replacement disk and we installed it. Now once installed, it defaulted to JBOD mode, which i thought was weird since the storage profile policy explicitly states a disk of its type should be a RAID0. I thought that maybe the policy just won't push down to a JBOD disk. I went ahead and marked it as an unconfigured good (it seems this is the only state the RAID controller will actually push the policy down to) but nothing happened, the disk just stays in the unconfigured good status, and never changes. Looking through Central and Manager, I don't see any options to "re-apply" the policy to the disk, or force it to RAID.

This is a bit worrisome, as over the next few years I'm sure we are to lose some disks, and I'm not exactly sure how to make sure this policy apply automatically, or right even apply at all.

The storage profile is protection enabled, so I'd guess I could remove the policy, and then reapply and hopefully it would pick up the unconfigured good disk, but this seems like a bad option, as at our company, even doing that would require an approved change, and would really elongate a simple task of just bringing a disk back online.

To note, the disk that went bad, was RAID'd correctly.

Does anyone see anything I'm missing, or have any advice on how to get this policy pushed down? I would have thought this would have been automatic.

Thanks in advance!!!

Kirk J · ‎02-25-2018

Greetings.

The storage profile is applied when the service profile is applied (and checked during re-ack of blade/server).

I had a customer a while back that hit similar issue with a Hadoop setup, and found it was less than convenient to deal with failed Raid0 VD.

I filed enhancement request https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvh30116/?reffering_site=dumpcr asking for a mechanism in UCSM GUI to re-init the VD without having to re-ack or disassociate.

You can force init the VD from storcli utility, but that's a lot of work to have to do for a mechanism that should be there in the UCSM/UCS central GUI.

Thanks,

Kirk...

View solution in original post

Fabián Ramírez · ‎02-23-2018

Hello,

Just to clarify remember a Raid 0 Virtual Drive is not going to rebuild because there´s nothing to be rebuild.

You could try creating the Storage Policy for that particular server from scratch just to be sure everything is on point:

https://www.cisco.com/c/en/us/support/docs/servers-unified-computing/ucs-manager/200970-configuring-storage-profiles-for-c-serie.html#anc6

If it does not work the LSI Raid Controller WebBios is an option, you can try creating the virtual Drive from there.

dietsoda · ‎02-23-2018

Thanks! I agree that there isn't a full raid to rebuild, this application just needs access to RAID cache and is highly available. However, why would central not recreate the virtual RAID 0 on a disk it doesn't see it's policy applied to? I understand why it would not apply a configuration to a JBOD disk, but when it is in an unconfigured good state, I would think the storage profile policy would take effect.

During OS install, we mark all disks are marked as unconfigured good so windows doesn't read them, and we use a SSD RAID1 policy for the storage profile for boot.

When install is complete we change storage profile policies to include the SAS drives at a RAID0, central pushes the policy the second it detects unconfigured good disks that don't have policy applied to them without a reboot, and Raids them live within windows.

I'm failing to understand why this isn't taking place for replacement disks that don't meet the storage profile policy?

Kirk J · ‎02-25-2018