10-02-2020 01:15 AM
Hello all...our our RSP5-SE we kept getting DIE_DIMM3 ERROR and show alarms gives us this error:
RP/0/RSP0/CPU0:ios#sh alarms detail system active
Wed Sep 30 10:38:32.055 UTC
------------------------------------------------------------------------------------
Active Alarms
------------------------------------------------------------------------------------
Description: DIE_DIMM3: in a failure state.
Location: 0/RSP0
AID: SM/HW_ENVMON_SENSOR_ALARM/4
Tag String: FAM_FAULT_TAG_HW_ENVMON_NM_SENSOR_FAULT
Module Name: N/A
EID: CHASSIS/LCC/1:CONTAINER/CC/1:MODULE/RP/1:MODULE/MOTHER_BOARD/1:SENSOR/TEMP/8
Reporting Agent ID: 50
Pending Sync: false
Severity: Minor
Status: Set
Group: Environ
Set Time: 09/30/2020 10:28:41 UTC
Clear Time: -
Service Affecting: NotServiceAffecting
Transport Direction: NotSpecified
Transport Source: NotSpecified
Threshold Value: -
Current Value: -
Bucket Type: NotSpecified
Event Type: Default
Interface: NIL
Alarm Name: sensor in a failure state
RP/0/RSP0/CPU0:ios#
After doing some digging I learned that the RSP5-SE comes with 40GB ECC Correcting DIMMs. I removed the metal protective plate to see if maybe moving the DIMMS around would fix this error. What I learned is:
There are 6 DIMM slots and only 5 DIMM slots were populated. I looked at the memory modules and they are all 8GB each. So 5x8GB would equal to the 40GB that Cisco ships the RSP5-SE with so it makes sense as to coming to the 40GB total. But leaving a DIMM slot open will constantly trigger this DIMM3 error. Is this a bug in IOSXR x64 6.5.1? Is this normal/acceptable? I realize that the error is merely "cosmetic" but still an error. Any input will be greatly appreciated.
Solved! Go to Solution.
10-12-2020 09:11 AM
Right, having the alarm set won't cause any impact on the system.
Sam
10-02-2020 09:29 AM
This looks to be fixed via CSCvq17023.
RSP5: Add support for displaying/hiding DIMM. RSP5 TR can have 16G or 24G memory(8+8+8) or (16+8).
Depending on number of DIMMs physically present on board, user should only see corresponding DIMM sensor.
There is some additional wording that they also tested the sensors for -SE cards like you have.
Integrated-releases: | 06.06.03 07.00.02 07.01.01 07.02.01 |
Let me double check with the folks that fixed this to make sure it will fix your condition.
Sam
10-02-2020 11:38 PM
Thank you Sam. I would have thought I would only see the sensor for the corresponding DIMMs that are installed on the RSP. I actually moved the DIMMS around to see if it would trigger the same alarm on the missing slot and it did. For example, moved DIMM5 to DIMM3, then DIMM5 would show the error. Moved DIMMs around again and left DIMM2 empty and same error on DIMM2. Below is a Show env all when DIMM3 was empty and you can see that the temp sensor shows "-" (null) as the RSP can't read the temperature because I assume there is no DIMM on slot DIMM3
================================================================================
Location TEMPERATURE Value Crit Major Minor Minor Major Crit
Sensor (deg C) (Lo) (Lo) (Lo) (Hi) (Hi) (Hi)
--------------------------------------------------------------------------------
0/RSP0
DIE_FabArbiter0 53 -10 -5 0 115 125 140
DIE_FabSwitch0 62 -10 -5 0 115 125 140
DIE_FabSwitch1 58 -10 -5 0 115 125 140
DIE_CPU 46 -10 -5 0 90 95 110
DIE_PCH 49 -10 -5 0 87 100 115
DIE_DIMM0 41 -10 -5 0 80 85 100
DIE_DIMM2 41 -10 -5 0 80 85 100
DIE_DIMM3 - -10 -5 0 80 85 100
DIE_DIMM4 37 -10 -5 0 80 85 100
DIE_DIMM5 36 -10 -5 0 80 85 100
SKYBLT0_Inlet 43 -10 -5 0 80 85 100
SKYBLT1_Inlet 39 -10 -5 0 80 85 100
High_Power 58 -10 -5 0 80 85 100
AIR_Outlet 48 -10 -5 0 80 85 100
Inlet 36 -10 -5 0 70 85 100
Hotspot 53 -10 -5 0 90 93 95
DIE_Aldrin 61 -10 -5 0 100 110 125
================================================================================
10-04-2020 02:36 AM
I moved memory out of DIMM3 and now the error is gone but memory is only showing less than 40GB
RP/0/RSP0/CPU0:ios#sh memory summary
Sun Oct 4 09:32:33.728 UTC
node: node0_RSP0_CPU0
------------------------------------------------------------------
Physical Memory: 35123M total (31072M available)
Application Memory : 35123M (29793M available)
Image: 4M (bootram: 0M)
Reserved: 0M, IOMem: 0M, flashfsys: 0M
Total shared window: 131M
RP/0/RSP0/CPU0:ios#
10-07-2020 09:38 AM
So that bug I quoted only fixes the -TR version of the card, I have just raised a bug to fix the -SE version of the card (CSCvw01617).
I do not have an ETA at this time.
Sam
10-12-2020 07:04 AM
Thanks so much Sam. I’m guessing this might be just “cosmetic” and is not service affecting.
10-12-2020 09:11 AM
Right, having the alarm set won't cause any impact on the system.
Sam
10-09-2020 01:21 AM - edited 10-09-2020 01:21 AM
I also noticed that no matter where I position the DIMMs only 36GB is being recognized when I issue
show memory summary
Maybe that's a bug too?
11-18-2020 01:41 AM
So I found this today
Guess this explains why only X amount of memory is being shown on show memory summary
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: