08-05-2011
02:50 PM
- last edited on
03-25-2019
01:36 PM
by
ciscomoderator
Just got that error as "critical". I don't get what "lower-non-recoverable"
Does that mean it is cold? Or is that the first alert for a high temperature?
Hope I am not being a brick, but that error just isn't obvious to me.
Craig
08-05-2011 10:45 PM
check your fans and your enviroment temperature eg. look for a free air flow outside the chassis
08-08-2011 12:25 PM
Thanks for the reply, here is the info:
UCS02-A(local-mgmt)# show version
System version: 1.4(2b)
UCS02-A(local-mgmt)# connect iom 3
Attaching to FEX 3 ...
To exit type 'exit', to abort type '$.'
fex-1# show platform software cmcctrl thermal status
magic: 0x486f7403 # OK
valid: 1
pid: 396
interval: 30 # seconds
write_ts: 1312830870 # Mon Aug 8 19:14:30 2011
stale_ts: 1312830915 # Mon Aug 8 19:15:15 2011 OK
now: 1312830876 # Mon Aug 8 19:14:36 2011
status: 2 # PASSIVE
policy_state: 1 # COOL
xreading: 1 # MISSING_DATA_SAFE_MODE
hwconf_valid: 1
maxfans: 8
fan[1].fault/read/req: 1/0/90 # MISSING
fan[2].fault/read/req: 1/0/90 # MISSING
fan[3].fault/read/req: 1/0/90 # MISSING
fan[4].fault/read/req: 1/0/90 # MISSING
fan[5].fault/read/req: 1/0/90 # MISSING
fan[6].fault/read/req: 1/0/90 # MISSING
fan[7].fault/read/req: 1/0/90 # MISSING
fan[8].fault/read/req: 1/0/90 # MISSING
nblades: 8
blade[1].present/policy_state: 2/1 # PRESENT/COOL
blade[2].present/policy_state: 2/1 # PRESENT/COOL
blade[3].present/policy_state: 2/1 # PRESENT/COOL
blade[4].present/policy_state: 2/1 # PRESENT/COOL
blade[5].present/policy_state: 2/1 # PRESENT/COOL
blade[6].present/policy_state: 2/1 # PRESENT/COOL
blade[7].present/policy_state: 2/1 # PRESENT/COOL
blade[8].present/policy_state: 2/1 # PRESENT/COOL
IOM.RWTEMPB: 37
IOM_THERM: 1 # COOL
PEER_STATUS: 1 # ACTIVE
PEER_IOM_THERM: 1 # COOL
CAUSES.Local.counts: COOL=45
CAUSES.Local.causes, first 8 reported:
1(b1),1(reading),2(sensor_id),47,"b1.FM_TEMP_SENS_IO"
1(b1),1(reading),2(sensor_id),48,"b1.FM_TEMP_SEN_REAR"
1(b1),1(reading),2(sensor_id),49,"b1.P2_TEMP_SENS"
1(b1),1(reading),2(sensor_id),50,"b1.P1_TEMP_SENS"
1(b1),1(reading),2(sensor_id),51,"b1.DDR3_P2_D1_TMP"
1(b1),1(reading),2(sensor_id),57,"b1.DDR3_P1_A1_TMP"
2(b2),1(reading),2(sensor_id),47,"b2.FM_TEMP_SENS_IO"
2(b2),1(reading),2(sensor_id),48,"b2.FM_TEMP_SEN_REAR"
CAUSES.Peer.counts: COOL=10
CAUSES.Peer.causes, first 8 reported:
1(b1),7(absent),1(not_applicable),0,"b1"
2(b2),7(absent),1(not_applicable),0,"b2"
3(b3),7(absent),1(not_applicable),0,"b3"
4(b4),7(absent),1(not_applicable),0,"b4"
5(b5),7(absent),1(not_applicable),0,"b5"
6(b6),7(absent),1(not_applicable),0,"b6"
7(b7),7(absent),1(not_applicable),0,"b7"
8(b8),7(absent),1(not_applicable),0,"b8"
UCS02-A# connect local-mgmt a
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 16925
fixup 24145
fixup 16
lostarbitration 67376
fixup 83687
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 16957
fixup 24179
fixup 16
lostarbitration 67504
fixup 83843
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 16979
fixup 24203
fixup 16
lostarbitration 67592
fixup 83953
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 17010
fixup 24236
fixup 16
lostarbitration 67716
fixup 84106
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 17038
fixup 24269
fixup 16
lostarbitration 67828
fixup 84244
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17067
fixup 24299
fixup 16
lostarbitration 67944
fixup 84387
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17090
fixup 24324
fixup 16
lostarbitration 68036
fixup 84499
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17116
fixup 24350
fixup 16
lostarbitration 68144
fixup 84628
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17150
fixup 24389
fixup 16
lostarbitration 68276
fixup 84794
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17177
fixup 24421
fixup 16
lostarbitration 68384
fixup 84929
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
08-05-2011 11:10 PM
Hi Craig,
Do help me with the following:
1) What version of the UCSM are you running?
2) Can you capture the following output from the CLI, this will help us with the thermal status summary from the chassis to dtermine if there is a thermla condition.
From the CLI on the either FI:
# connect iom 3
# show platform software cmcctrl thermal status
3) Also has the fault cleared or is it still there? Does the fault flasp i.e. clears and comes back?
./Abhinav
08-08-2011 02:55 AM
Hi abbharga,
I have the same problem
1) What version of the UCSM are you running?
- 1.4(3l)
2) Can you capture the following output from the CLI, this will help us with the thermal status summary from the chassis to dtermine if there is a thermla condition.
From the CLI on the either FI:
# connect iom 3
# show platform software cmcctrl thermal status
- show platform software cmcctrl thermal status
magic: 0x486f7403 # OK
valid: 1
pid: 435
interval: 30 # seconds
write_ts: 1312796925 # Mon Aug 8 16:48:45 2011
stale_ts: 1312796970 # Mon Aug 8 16:49:30 2011 OK
now: 1312796943 # Mon Aug 8 16:49:03 2011
status: 1 # ACTIVE
policy_state: 1 # COOL
xreading: 1 # MISSING_DATA_SAFE_MODE
hwconf_valid: 1
maxfans: 8
fan[1].fault/read/req: 1/0/90 # MISSING
fan[2].fault/read/req: 1/0/90 # MISSING
fan[3].fault/read/req: 1/0/90 # MISSING
fan[4].fault/read/req: 1/0/90 # MISSING
fan[5].fault/read/req: 1/0/90 # MISSING
fan[6].fault/read/req: 1/0/90 # MISSING
fan[7].fault/read/req: 1/0/90 # MISSING
fan[8].fault/read/req: 1/0/90 # MISSING
nblades: 8
blade[1].present/policy_state: 1/1 # ABSENT/COOL
blade[2].present/policy_state: 1/1 # ABSENT/COOL
blade[3].present/policy_state: 1/1 # ABSENT/COOL
blade[4].present/policy_state: 1/1 # ABSENT/COOL
blade[5].present/policy_state: 1/1 # ABSENT/COOL
blade[6].present/policy_state: 1/1 # ABSENT/COOL
blade[7].present/policy_state: 1/1 # ABSENT/COOL
blade[8].present/policy_state: 1/1 # ABSENT/COOL
IOM.RWTEMPB: 40
IOM_THERM: 1 # COOL
PEER_STATUS: 2 # PASSIVE
PEER_IOM_THERM: 1 # COOL
CAUSES.Local.counts: COOL=10
CAUSES.Local.causes, first 8 reported:
1(b1),7(absent),1(not_applicable),0,"b1"
2(b2),7(absent),1(not_applicable),0,"b2"
3(b3),7(absent),1(not_applicable),0,"b3"
4(b4),7(absent),1(not_applicable),0,"b4"
5(b5),7(absent),1(not_applicable),0,"b5"
6(b6),7(absent),1(not_applicable),0,"b6"
7(b7),7(absent),1(not_applicable),0,"b7"
8(b8),7(absent),1(not_applicable),0,"b8"
CAUSES.Peer.counts: COOL=109
CAUSES.Peer.causes, first 8 reported:
1(b1),1(reading),2(sensor_id),47,"b1.FM_TEMP_SENS_IO"
1(b1),1(reading),2(sensor_id),48,"b1.FM_TEMP_SEN_REAR"
1(b1),1(reading),2(sensor_id),49,"b1.P2_TEMP_SENS"
1(b1),1(reading),2(sensor_id),50,"b1.P1_TEMP_SENS"
1(b1),1(reading),2(sensor_id),51,"b1.DDR3_P2_D1_TMP"
1(b1),1(reading),2(sensor_id),52,"b1.DDR3_P2_D2_TMP"
1(b1),1(reading),2(sensor_id),53,"b1.DDR3_P2_E1_TMP"
1(b1),1(reading),2(sensor_id),55,"b1.DDR3_P2_F1_TMP"
3) Also has the fault cleared or is it still there? Does the fault flasp i.e. clears and comes back?
- Still there like permanent critical
08-08-2011 05:05 AM
Hi Attavit,
A quick check, does this chassis have blades installed on it? As the above output shows it cannot see the blades.
Also there seems to be an issue reading the fans.
Can you help me with this output too (I need you to run this command once every minute for each IOM and send me 5 outputs each IOM)
(You need to be in the local-mgmt context to run this commnad):
# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
Other option you have is to open up a TAC service request and we can also look at this from there.
./Abhinav
08-08-2011 06:31 AM
Hi Abbharga,
Today I restart the chassis and now no critical log from IOM again.
But I have something to share. Before I restart the log just show the power supply cannot redundancy so I think it was wrong (Powerd on 4 power supply).
08-08-2011 06:36 AM
Hi Attavit,
Thanks for the update and good to know you are not seeing the issues currently.
The bug which I was suspecting to cause the fan issue can also cause this PSU issue.
In case you see either of these issues coming back, do let me know the output of the command I requested above or feel free to open a TAC case.
./Abhinav
08-08-2011 06:47 AM
Hi Abbharga,
If the problem happen again. I'll get the output for you.
Thank you for your help
08-08-2011 01:36 PM
Thanks for the reply, here is the info:
UCS02-A(local-mgmt)# show version
System version: 1.4(2b)
UCS02-A(local-mgmt)# connect iom 3
Attaching to FEX 3 ...
To exit type 'exit', to abort type '$.'
fex-1# show platform software cmcctrl thermal status
magic: 0x486f7403 # OK
valid: 1
pid: 396
interval: 30 # seconds
write_ts: 1312830870 # Mon Aug 8 19:14:30 2011
stale_ts: 1312830915 # Mon Aug 8 19:15:15 2011 OK
now: 1312830876 # Mon Aug 8 19:14:36 2011
status: 2 # PASSIVE
policy_state: 1 # COOL
xreading: 1 # MISSING_DATA_SAFE_MODE
hwconf_valid: 1
maxfans: 8
fan[1].fault/read/req: 1/0/90 # MISSING
fan[2].fault/read/req: 1/0/90 # MISSING
fan[3].fault/read/req: 1/0/90 # MISSING
fan[4].fault/read/req: 1/0/90 # MISSING
fan[5].fault/read/req: 1/0/90 # MISSING
fan[6].fault/read/req: 1/0/90 # MISSING
fan[7].fault/read/req: 1/0/90 # MISSING
fan[8].fault/read/req: 1/0/90 # MISSING
nblades: 8
blade[1].present/policy_state: 2/1 # PRESENT/COOL
blade[2].present/policy_state: 2/1 # PRESENT/COOL
blade[3].present/policy_state: 2/1 # PRESENT/COOL
blade[4].present/policy_state: 2/1 # PRESENT/COOL
blade[5].present/policy_state: 2/1 # PRESENT/COOL
blade[6].present/policy_state: 2/1 # PRESENT/COOL
blade[7].present/policy_state: 2/1 # PRESENT/COOL
blade[8].present/policy_state: 2/1 # PRESENT/COOL
IOM.RWTEMPB: 37
IOM_THERM: 1 # COOL
PEER_STATUS: 1 # ACTIVE
PEER_IOM_THERM: 1 # COOL
CAUSES.Local.counts: COOL=45
CAUSES.Local.causes, first 8 reported:
1(b1),1(reading),2(sensor_id),47,"b1.FM_TEMP_SENS_IO"
1(b1),1(reading),2(sensor_id),48,"b1.FM_TEMP_SEN_REAR"
1(b1),1(reading),2(sensor_id),49,"b1.P2_TEMP_SENS"
1(b1),1(reading),2(sensor_id),50,"b1.P1_TEMP_SENS"
1(b1),1(reading),2(sensor_id),51,"b1.DDR3_P2_D1_TMP"
1(b1),1(reading),2(sensor_id),57,"b1.DDR3_P1_A1_TMP"
2(b2),1(reading),2(sensor_id),47,"b2.FM_TEMP_SENS_IO"
2(b2),1(reading),2(sensor_id),48,"b2.FM_TEMP_SEN_REAR"
CAUSES.Peer.counts: COOL=10
CAUSES.Peer.causes, first 8 reported:
1(b1),7(absent),1(not_applicable),0,"b1"
2(b2),7(absent),1(not_applicable),0,"b2"
3(b3),7(absent),1(not_applicable),0,"b3"
4(b4),7(absent),1(not_applicable),0,"b4"
5(b5),7(absent),1(not_applicable),0,"b5"
6(b6),7(absent),1(not_applicable),0,"b6"
7(b7),7(absent),1(not_applicable),0,"b7"
8(b8),7(absent),1(not_applicable),0,"b8"
UCS02-A# connect local-mgmt a
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 16925
fixup 24145
fixup 16
lostarbitration 67376
fixup 83687
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 16957
fixup 24179
fixup 16
lostarbitration 67504
fixup 83843
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 16979
fixup 24203
fixup 16
lostarbitration 67592
fixup 83953
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 17010
fixup 24236
fixup 16
lostarbitration 67716
fixup 84106
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
lostarbitration 17038
fixup 24269
fixup 16
lostarbitration 67828
fixup 84244
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17067
fixup 24299
fixup 16
lostarbitration 67944
fixup 84387
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17090
fixup 24324
fixup 16
lostarbitration 68036
fixup 84499
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17116
fixup 24350
fixup 16
lostarbitration 68144
fixup 84628
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17150
fixup 24389
fixup 16
lostarbitration 68276
fixup 84794
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep "lostarb|fixup"
lostarbitration 17177
fixup 24421
fixup 16
lostarbitration 68384
fixup 84929
fixup 1
lostarbitration 118395
fixup 158836
fixup 130
lostarbitration 472065
fixup 586748
fixup 1
08-08-2011 10:22 PM
Hi Craig,
From the output above I can see the traces of you being effected by a know bug.
Bug: CSCtq10987 IOM I2C driver, noisy PSU bus spoils next non-PSU IO operation
Please go ahead and open a TAC case and we'll be able to help you from there.
The is however fixed in the 1.4.3L version of the code.
Another sugestion I'll have for you is to move from the 1.4.2b code to the latest 1.4.3 code, since there is a known major bug in 1.4.2b.
Thnaks!
./Abhinav
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide