cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
215
Views
1
Helpful
7
Replies

Cisco UCS Manager Critical Error

Hamidsattarrana
Level 1
Level 1

Hello Everyone,

I am receiving the following error on Cisco UCS Manager,

 


<faultInst
ack="yes"
cause="health-led-amber-blinking"
changeSet=""
code="F1236"
created="2023-06-13T21:20:13Z"
descr="sys/rack-unit-4/health-led shows error. Reason DDR4_P1_A1_ECC:Sensor Threshold Crossed; "
highestSeverity="critical"
id="1696528"
lastTransition="2023-06-13T21:23:27Z"
lc="none"
occur="2"
origSeverity="critical"
prevSeverity="cleared"
rule="equipment-health-led-critical-error"
severity="critical"
tags="server"
type="equipment"
dn="sys/rack-unit-4/health-led/fault-F1236"

status="created"

sacl="addchild,del,mod">
</faultInst>

7 Replies 7

Derek Szolucha
Level 1
Level 1

Hi Hami,

It appears that one of your DIMMMs DDR4_P1_A1 has crossed the threshold of ECC - correctible errors. You can find more details in the server SEL logs. It appears that the DIMM has to be replaced. If you have an active Cisco Support Contract, you can open Cisco SR to RMA the DIMM.

Derek

Hamidsattarrana
Level 1
Level 1

Hi,

How can I check if we have active Cisco Support Contract?

1 If you log a support call with Cisco and enter the server serial number, it should tell you if the server is still under maintenance.

2 You can Check Device Coverage by entering the SN at https://cway.cisco.com/sncheck/ 

Hi,

I checked it's not under warranty.

I am also having these errors.

Server 4 (service profile: org-root/org-server/org-VoIP/ls-Storage1) health: inoperable

RAID Battery on server 4 operability: inoperable. Reason: BBU has failed, needs replacement

DIMM DIMM_A1 on server 4 operability: inoperable

sys/rack-unit-4/health-led shows error. Reason DDR4_P1_A1_ECC:Sensor Threshold Crossed;

 

Does anyone know how I can fix this error? I don't have any experience with UCS, so any help would be really appreciated.

 

Thank you

 

 

 

 

The error message for hardware failing are usually fairly self-explanatory:

RAID Battery on server 4 operability: inoperable. Reason: BBU has failed, needs replacement

= replace the battery on the RAID controller or swap with a known good one. 

DIMM DIMM_A1 on server 4 operability: inoperable

= replace the DIMM in the failed position. Sometimes reseating the DIMM or swapping with a DIMM in another slot then reseat. Also have a look at Troubleshoot DIMM Memory Issues in UCS

Hamidsattarrana
Level 1
Level 1

Hi,

Thanks Riaan.

Do you know how I can enbale SSH, I have access to UCS GUI but I don't know the SSH password, can I reset or setup as new password on UCS?

 

Hamidsattarrana
Level 1
Level 1

Hi,

I finally found the issue the RAM on Dimm A1 is not working anymore. So all of the following errors are due to this.

Error: DIMM DIMM_A1 on server 4 operability: inoperable
Code: F0185

Error: sys/rack-unit-4/health-led shows error. Reason DDR4_P1_A1_ECC:Sensor Threshold Crossed;
Code: F1236

Error: Server 4 (service profile: org-root/org-server/org-VoIP/ls-Storage1) health: inoperable
Code: F0317

Now I just need a couple of things.

How I can enable SSH from the UCS GUI?

Second how I can shutdown this server gracefully?

Thank you

 

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card