Cisco UCS: Getting error "Consistency Check Failed: please check the controller"

Sandeep Singh · ‎03-31-2015

Introduction
RAID Configuration
DeterminingController in theServer
Server Disk Drive Monitoring
Logs on RAID Controller
Resolution
Related Information

Introduction

In a UCS system each fault represents a failure in the Cisco UCS instance or an alarm threshold that has been raised. Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, then the object transitions to a functional state. You can view all faults in the Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. You can also configure the fault collection policy to determine how a Cisco UCS instance collects and retains faults. All Cisco UCS faults can be trapped by SNMP.

This document describes the scenario where the user is getting the following alarm on a C220 server.
"Storage Virtual Drive 0 Consistency Check Failed: please check the controller, or reseat the physical drives"

RAID Configuration

You can use the RAID Configuration section in the Cisco UCS Server Configuration Utility document to configure your system RAID controllers. RAID levels supported by SCU are RAID 0, 1, 5, and 6. If your system has multiple RAID controllers, Cisco UCS Server Configuration Utility displays a list of all available RAID devices. This feature is described in the Server Configuration section.

For more info refer to
http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/ucsscu/user/guide/30/UCS_SCU.html

Determining Controller in the Server

You can use the Cisco UCS Manager GUI Inventory tab to determine which controller is installed in a server. CIMC has a similar functionality.

There is a dedicated SAS riser slot for the RAID controller card in a C-series chassis. There is also a mounting point inside the chassis for the optional RAID battery backup unit that is available when using the appropriate LSI controller.

Server Disk Drive Monitoring

The disk drive monitoring for Cisco UCS provides Cisco UCS Manager with blade-resident disk drive status for supported blade servers in a Cisco UCS domain. Disk drive monitoring provides a unidirectional fault signal from the LSI firmware to Cisco UCS Manager to provide status information.

The following server and firmware components gather, send, and aggregate information about the disk drive status in a server:
Physical presence sensor—Determines whether the disk drive is inserted in the server drive bay.
Physical fault sensor—Determines the operability status reported by the LSI storage controller firmware for the disk drive.
IPMI disk drive fault and presence sensors—Sends the sensor results to Cisco UCS Manager.
Disk drive fault LED control and associated IPMI sensors—Controls disk drive fault LED states (on/off) and relays the states to Cisco UCS Manager.

Cisco UCS Manager cannot monitor disk drives in any other blade server or rack-mount server.

Logs on RAID Controller

Following logs are seen on the controller.

Tue Mar 24 07:43:11 2015 Fatal Puncturing bad block on PD 0d(e0xfc/s6) at b8b791b
Tue Mar 24 07:43:11 2015 Fatal Unrecoverable medium error during recovery on PD 0d(e0xfc/s6) at b8b791b
Tue Mar 24 07:43:11 2015 Info "Unexpected sense: PD 0d(e0xfc/s6) Path 50000395483a8a2e, CDB: 28 00 0b 8b 79 1b 00 00 01 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info "Unexpected sense: PD 10(e0xfc/s1) Path 50000395b83275be, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info "Unexpected sense: PD 10(e0xfc/s1) Path 50000395b83275be, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info "Unexpected sense: PD 0d(e0xfc/s6) Path 50000395483a8a2e, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"
Tue Mar 24 07:43:11 2015 Info "Unexpected sense: PD 0d(e0xfc/s6) Path 50000395483a8a2e, CDB: 28 00 0b 8b 79 00 00 01 00 00, Sense: 3/14/00"

Status of physical disks is as follow.

Physical Drive Number Controller Health Status Manufacturer Model Predictive Failure Count Drive Firmware Coerced Size Type
--------------------- ---------- -------------- ---------------------- -------------- -------------- ------------------------ -------------- -------------- -----
1 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
2 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
3 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
4 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
5 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
6 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
7 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD
8 SLOT-2 Good Online TOSHIBA MK3001GRRB 0 5702 285148 MB HDD

Virtual Drive Health Status Name Size RAID Level Boot Drive
------------- -------------- -------------------- ---------------- ---------- ---------- ----------
0 Good Optimal 1996036 MB RAID 5 true

Resolution

In this case the error message is related to bug CSCue84667. This is a cosmetic bug and the log will get clear up once Consistency Check operation is done. During Consistency Check operation via MSM/MegaCli, Fault Engine will report Consistency Check Failed even CC operation is still going and not actually failing. The description is some how misleading.

Related Information

Cisco UCS Servers RAID Guide

F1010 on C series running 1.5(4)