cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

Cisco UCS: Getting error "Consistency Check Failed: please check the controller"

3814
Views
0
Helpful
0
Comments

 


Introduction


In a UCS system each fault represents a failure in the Cisco UCS instance or an alarm threshold that has been raised. Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state. You can view all faults in the Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. You can also configure the fault collection policy to determine how a Cisco UCS instance collects and retains faults. All Cisco UCS faults can be trapped by SNMP.

This document describes the scenario where the user is getting the following alarm on a C220 server.
"Storage Virtual Drive 0 Consistency Check Failed: please check the controller, or reseat the physical drives"

 

RAID Configuration

You can use the RAID Configuration section in the Cisco UCS Server Configuration Utility document to configure your system RAID controllers. RAID levels supported by SCU are RAID 0, 1, 5, and 6. If your system has multiple RAID controllers, Cisco UCS Server Configuration Utility displays a list of all available RAID devices. This feature is described in the Server Configuration section.

For more info refer to
http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/ucsscu/user/guide/30/UCS_SCU.html


Determining Controller in the Server 

You can use the Cisco UCS Manager GUI Inventory tab to determine which controller is installed in a server. CIMC has a similar functionality. 

There is a dedicated SAS riser slot for the RAID controller card in a C-series chassis. There is also a mounting point inside the chassis for the optional RAID battery backup unit that is available when using the appropriate LSI controller.


Server Disk Drive Monitoring

The disk drive monitoring for Cisco UCS provides Cisco UCS Manager with blade-resident disk drive status for supported blade servers in a Cisco UCS domain. Disk drive monitoring provides a unidirectional fault signal from the LSI firmware to Cisco UCS Manager to provide status information.

The following server and firmware components gather, send, and aggregate information about the disk drive status in a server:
Physical presence sensor—Determines whether the disk drive is inserted in the server drive bay. 
Physical fault sensor—Determines the operability status reported by the LSI storage controller firmware for the disk drive. 
IPMI disk drive fault and presence sensors—Sends the sensor results to Cisco UCS Manager. 
Disk drive fault LED control and associated IPMI sensors—Controls disk drive fault LED states (on/off) and relays the states to Cisco UCS Manager. 

Cisco UCS Manager cannot monitor disk drives in any other blade server or rack-mount server.

 

Logs on RAID Controller

Following logs are seen on the controller.


Tue Mar 24 07:43:11 2015 Fatal     Puncturing bad block on PD 0d(e0xfc/s6) at b8b791b
Tue Mar 24 07:43:11 2015 Fatal     Unrecoverable medium error during recovery on PD 0d(e0xfc/s6) at b8b791b
Tue Mar 24 07:43:11 2015 Info      "Unexpected sense: PD 0d(e0xfc/s6) Path 50000395483a8a2e, CDB: 28 00 0b 8b 79 1b 00 00 01 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info      "Unexpected sense: PD 10(e0xfc/s1) Path 50000395b83275be, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info      "Unexpected sense: PD 10(e0xfc/s1) Path 50000395b83275be, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info      "Unexpected sense: PD 0d(e0xfc/s6) Path 50000395483a8a2e, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info      "Unexpected sense: PD 0d(e0xfc/s6) Path 50000395483a8a2e, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"

Status of physical disks is as follow.

Physical Drive Number Controller Health         Status                 Manufacturer   Model          Predictive Failure Count Drive Firmware Coerced Size   Type
--------------------- ---------- -------------- ---------------------- -------------- -------------- ------------------------ -------------- -------------- -----
1                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
2                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
3                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
4                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
5                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
6                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
7                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD
8                     SLOT-2     Good           Online                 TOSHIBA        MK3001GRRB     0                        5702           285148 MB      HDD

Virtual Drive Health         Status               Name             Size       RAID Level Boot Drive
------------- -------------- -------------------- ---------------- ---------- ---------- ----------
0             Good           Optimal                               1996036 MB RAID 5     true

 

Resolution


In this case the error message is related to bug CSCue84667. This is a cosmetic bug and the log will get clear up once Consistency Check operation is done. During Consistency Check operation via MSM/MegaCli, Fault Engine will report Consistency Check Failed even CC operation is still going and not actually failing. The description is some how misleading. 

 

Related Information


Cisco UCS Servers RAID Guide

F1010 on C series running 1.5(4)