cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3313
Views
0
Helpful
10
Replies

IOM Temperature: lower-non-recoverable?

kg6itcraig
Level 1
Level 1


Just got that error as "critical". I don't get what "lower-non-recoverable"

Does that mean it is cold? Or is that the first alert for a high temperature?

Hope I am not being a brick, but that error just isn't obvious to me.

Craig

My UCS Blog http://realworlducs.com
10 Replies 10

l.waldenberger
Level 1
Level 1

check your fans and your enviroment temperature eg. look for a free air flow outside the chassis

Thanks for the reply, here is the info:

UCS02-A(local-mgmt)# show version

System version: 1.4(2b)

UCS02-A(local-mgmt)# connect iom 3

Attaching to FEX 3 ...

To exit type 'exit', to abort type '$.'

fex-1# show platform software cmcctrl thermal status

magic:                0x486f7403        # OK

valid:                1

pid:                396

interval:        30                # seconds

write_ts:        1312830870        # Mon Aug  8 19:14:30 2011

stale_ts:        1312830915        # Mon Aug  8 19:15:15 2011 OK

now:                1312830876        # Mon Aug  8 19:14:36 2011

status:                2                # PASSIVE

policy_state:        1                # COOL

xreading:        1                # MISSING_DATA_SAFE_MODE

hwconf_valid:        1

maxfans:                8

fan[1].fault/read/req:        1/0/90        # MISSING

fan[2].fault/read/req:        1/0/90        # MISSING

fan[3].fault/read/req:        1/0/90        # MISSING

fan[4].fault/read/req:        1/0/90        # MISSING

fan[5].fault/read/req:        1/0/90        # MISSING

fan[6].fault/read/req:        1/0/90        # MISSING

fan[7].fault/read/req:        1/0/90        # MISSING

fan[8].fault/read/req:        1/0/90        # MISSING

nblades:        8

blade[1].present/policy_state:        2/1        # PRESENT/COOL

blade[2].present/policy_state:        2/1        # PRESENT/COOL

blade[3].present/policy_state:        2/1        # PRESENT/COOL

blade[4].present/policy_state:        2/1        # PRESENT/COOL

blade[5].present/policy_state:        2/1        # PRESENT/COOL

blade[6].present/policy_state:        2/1        # PRESENT/COOL

blade[7].present/policy_state:        2/1        # PRESENT/COOL

blade[8].present/policy_state:        2/1        # PRESENT/COOL

IOM.RWTEMPB: 37

IOM_THERM: 1        # COOL

PEER_STATUS:                1                # ACTIVE

PEER_IOM_THERM: 1        # COOL

CAUSES.Local.counts: COOL=45

CAUSES.Local.causes, first 8 reported:

        1(b1),1(reading),2(sensor_id),47,"b1.FM_TEMP_SENS_IO"

        1(b1),1(reading),2(sensor_id),48,"b1.FM_TEMP_SEN_REAR"

        1(b1),1(reading),2(sensor_id),49,"b1.P2_TEMP_SENS"

        1(b1),1(reading),2(sensor_id),50,"b1.P1_TEMP_SENS"

        1(b1),1(reading),2(sensor_id),51,"b1.DDR3_P2_D1_TMP"

        1(b1),1(reading),2(sensor_id),57,"b1.DDR3_P1_A1_TMP"

        2(b2),1(reading),2(sensor_id),47,"b2.FM_TEMP_SENS_IO"

        2(b2),1(reading),2(sensor_id),48,"b2.FM_TEMP_SEN_REAR"

CAUSES.Peer.counts: COOL=10

CAUSES.Peer.causes, first 8 reported:

        1(b1),7(absent),1(not_applicable),0,"b1"

        2(b2),7(absent),1(not_applicable),0,"b2"

        3(b3),7(absent),1(not_applicable),0,"b3"

        4(b4),7(absent),1(not_applicable),0,"b4"

        5(b5),7(absent),1(not_applicable),0,"b5"

        6(b6),7(absent),1(not_applicable),0,"b6"

        7(b7),7(absent),1(not_applicable),0,"b7"

        8(b8),7(absent),1(not_applicable),0,"b8"

UCS02-A# connect  local-mgmt a

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 16925

        fixup 24145

        fixup 16

        lostarbitration 67376

        fixup 83687

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 16957

        fixup 24179

        fixup 16

        lostarbitration 67504

        fixup 83843

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 16979

        fixup 24203

        fixup 16

        lostarbitration 67592

        fixup 83953

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 17010

        fixup 24236

        fixup 16

        lostarbitration 67716

        fixup 84106

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 17038

        fixup 24269

        fixup 16

        lostarbitration 67828

        fixup 84244

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17067

        fixup 24299

        fixup 16

        lostarbitration 67944

        fixup 84387

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17090

        fixup 24324

        fixup 16

        lostarbitration 68036

        fixup 84499

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17116

        fixup 24350

        fixup 16

        lostarbitration 68144

        fixup 84628

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17150

        fixup 24389

        fixup 16

        lostarbitration 68276

        fixup 84794

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17177

        fixup 24421

        fixup 16

        lostarbitration 68384

        fixup 84929

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

My UCS Blog http://realworlducs.com

abbharga
Level 4
Level 4

Hi Craig,

Do help me with the following:

1) What version of the UCSM are you running?

2) Can you capture the following output from the CLI, this will help us with the thermal status summary from the chassis to dtermine if there is a thermla condition.

From the CLI on the either FI:

# connect iom 3

# show platform software cmcctrl thermal status

3) Also has the fault cleared or is it still there? Does the fault flasp i.e. clears and comes back?

./Abhinav

Hi abbharga,

I have the same problem

1) What version of the UCSM are you running?

- 1.4(3l)

2) Can you capture the following output from the CLI, this will help us with the thermal status summary from the chassis to dtermine if there is a thermla condition.

From the CLI on the either FI:

# connect iom 3

# show platform software cmcctrl thermal status

- show platform software cmcctrl thermal status

magic:                0x486f7403        # OK

valid:                1

pid:                435

interval:        30                # seconds

write_ts:        1312796925        # Mon Aug  8 16:48:45 2011

stale_ts:        1312796970        # Mon Aug  8 16:49:30 2011 OK

now:                1312796943        # Mon Aug  8 16:49:03 2011

status:                1                # ACTIVE

policy_state:        1                # COOL

xreading:        1                # MISSING_DATA_SAFE_MODE

hwconf_valid:        1

maxfans:                8

fan[1].fault/read/req:        1/0/90        # MISSING

fan[2].fault/read/req:        1/0/90        # MISSING

fan[3].fault/read/req:        1/0/90        # MISSING

fan[4].fault/read/req:        1/0/90        # MISSING

fan[5].fault/read/req:        1/0/90        # MISSING

fan[6].fault/read/req:        1/0/90        # MISSING

fan[7].fault/read/req:        1/0/90        # MISSING

fan[8].fault/read/req:        1/0/90        # MISSING

nblades:        8

blade[1].present/policy_state:        1/1        # ABSENT/COOL

blade[2].present/policy_state:        1/1        # ABSENT/COOL

blade[3].present/policy_state:        1/1        # ABSENT/COOL

blade[4].present/policy_state:        1/1        # ABSENT/COOL

blade[5].present/policy_state:        1/1        # ABSENT/COOL

blade[6].present/policy_state:        1/1        # ABSENT/COOL

blade[7].present/policy_state:        1/1        # ABSENT/COOL

blade[8].present/policy_state:        1/1        # ABSENT/COOL

IOM.RWTEMPB: 40

IOM_THERM: 1        # COOL

PEER_STATUS:                2                # PASSIVE

PEER_IOM_THERM: 1        # COOL

CAUSES.Local.counts: COOL=10

CAUSES.Local.causes, first 8 reported:

        1(b1),7(absent),1(not_applicable),0,"b1"

        2(b2),7(absent),1(not_applicable),0,"b2"

        3(b3),7(absent),1(not_applicable),0,"b3"

        4(b4),7(absent),1(not_applicable),0,"b4"

        5(b5),7(absent),1(not_applicable),0,"b5"

        6(b6),7(absent),1(not_applicable),0,"b6"

        7(b7),7(absent),1(not_applicable),0,"b7"

        8(b8),7(absent),1(not_applicable),0,"b8"

CAUSES.Peer.counts: COOL=109

CAUSES.Peer.causes, first 8 reported:

        1(b1),1(reading),2(sensor_id),47,"b1.FM_TEMP_SENS_IO"

        1(b1),1(reading),2(sensor_id),48,"b1.FM_TEMP_SEN_REAR"

        1(b1),1(reading),2(sensor_id),49,"b1.P2_TEMP_SENS"

        1(b1),1(reading),2(sensor_id),50,"b1.P1_TEMP_SENS"

        1(b1),1(reading),2(sensor_id),51,"b1.DDR3_P2_D1_TMP"

        1(b1),1(reading),2(sensor_id),52,"b1.DDR3_P2_D2_TMP"

        1(b1),1(reading),2(sensor_id),53,"b1.DDR3_P2_E1_TMP"

        1(b1),1(reading),2(sensor_id),55,"b1.DDR3_P2_F1_TMP"

3) Also has the fault cleared or is it still there? Does the fault flasp i.e. clears and comes back?

- Still there like permanent critical

Hi Attavit,

A quick check, does this chassis have blades installed on it? As the above output shows it cannot see the blades.

Also there seems to be an issue reading the fans.

Can you help me with this output too (I need you to run this command once every minute for each IOM and send me 5 outputs each IOM)

(You need to be in the local-mgmt context to run this commnad):

# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

Other option you have is to open up a TAC service request and we can also look at this from there.

./Abhinav

Hi Abbharga,

Today I restart the chassis and now no critical log from IOM again.

But I have something to share. Before I restart the log just show the power supply cannot redundancy so I think it was wrong (Powerd on 4 power supply).

Hi Attavit,

Thanks for the update and good to know you are not seeing the issues currently.

The bug which I was suspecting to cause the fan issue can also cause this PSU issue.

In case you see either of these issues coming back, do let me know the output of the command I requested above or feel free to open a TAC case.

./Abhinav

Hi Abbharga,

If the problem happen again. I'll get the output for you.

Thank you for your help

kg6itcraig
Level 1
Level 1

Thanks for the reply, here is the info:

UCS02-A(local-mgmt)# show version

System version: 1.4(2b)

UCS02-A(local-mgmt)# connect iom 3

Attaching to FEX 3 ...

To exit type 'exit', to abort type '$.'

fex-1# show platform software cmcctrl thermal status

magic:                0x486f7403        # OK

valid:                1

pid:                396

interval:        30                # seconds

write_ts:        1312830870        # Mon Aug  8 19:14:30 2011

stale_ts:        1312830915        # Mon Aug  8 19:15:15 2011 OK

now:                1312830876        # Mon Aug  8 19:14:36 2011

status:                2                # PASSIVE

policy_state:        1                # COOL

xreading:        1                # MISSING_DATA_SAFE_MODE

hwconf_valid:        1

maxfans:                8

fan[1].fault/read/req:        1/0/90        # MISSING

fan[2].fault/read/req:        1/0/90        # MISSING

fan[3].fault/read/req:        1/0/90        # MISSING

fan[4].fault/read/req:        1/0/90        # MISSING

fan[5].fault/read/req:        1/0/90        # MISSING

fan[6].fault/read/req:        1/0/90        # MISSING

fan[7].fault/read/req:        1/0/90        # MISSING

fan[8].fault/read/req:        1/0/90        # MISSING

nblades:        8

blade[1].present/policy_state:        2/1        # PRESENT/COOL

blade[2].present/policy_state:        2/1        # PRESENT/COOL

blade[3].present/policy_state:        2/1        # PRESENT/COOL

blade[4].present/policy_state:        2/1        # PRESENT/COOL

blade[5].present/policy_state:        2/1        # PRESENT/COOL

blade[6].present/policy_state:        2/1        # PRESENT/COOL

blade[7].present/policy_state:        2/1        # PRESENT/COOL

blade[8].present/policy_state:        2/1        # PRESENT/COOL

IOM.RWTEMPB: 37

IOM_THERM: 1        # COOL

PEER_STATUS:                1                # ACTIVE

PEER_IOM_THERM: 1        # COOL

CAUSES.Local.counts: COOL=45

CAUSES.Local.causes, first 8 reported:

        1(b1),1(reading),2(sensor_id),47,"b1.FM_TEMP_SENS_IO"

        1(b1),1(reading),2(sensor_id),48,"b1.FM_TEMP_SEN_REAR"

        1(b1),1(reading),2(sensor_id),49,"b1.P2_TEMP_SENS"

        1(b1),1(reading),2(sensor_id),50,"b1.P1_TEMP_SENS"

        1(b1),1(reading),2(sensor_id),51,"b1.DDR3_P2_D1_TMP"

        1(b1),1(reading),2(sensor_id),57,"b1.DDR3_P1_A1_TMP"

        2(b2),1(reading),2(sensor_id),47,"b2.FM_TEMP_SENS_IO"

        2(b2),1(reading),2(sensor_id),48,"b2.FM_TEMP_SEN_REAR"

CAUSES.Peer.counts: COOL=10

CAUSES.Peer.causes, first 8 reported:

        1(b1),7(absent),1(not_applicable),0,"b1"

        2(b2),7(absent),1(not_applicable),0,"b2"

        3(b3),7(absent),1(not_applicable),0,"b3"

        4(b4),7(absent),1(not_applicable),0,"b4"

        5(b5),7(absent),1(not_applicable),0,"b5"

        6(b6),7(absent),1(not_applicable),0,"b6"

        7(b7),7(absent),1(not_applicable),0,"b7"

        8(b8),7(absent),1(not_applicable),0,"b8"

UCS02-A# connect  local-mgmt a

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 16925

        fixup 24145

        fixup 16

        lostarbitration 67376

        fixup 83687

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 16957

        fixup 24179

        fixup 16

        lostarbitration 67504

        fixup 83843

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 16979

        fixup 24203

        fixup 16

        lostarbitration 67592

        fixup 83953

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 17010

        fixup 24236

        fixup 16

        lostarbitration 67716

        fixup 84106

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 1 brief | no-more | egrep "lostarb|fixup"

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

        lostarbitration 17038

        fixup 24269

        fixup 16

        lostarbitration 67828

        fixup 84244

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17067

        fixup 24299

        fixup 16

        lostarbitration 67944

        fixup 84387

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17090

        fixup 24324

        fixup 16

        lostarbitration 68036

        fixup 84499

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17116

        fixup 24350

        fixup 16

        lostarbitration 68144

        fixup 84628

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17150

        fixup 24389

        fixup 16

        lostarbitration 68276

        fixup 84794

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

UCS02-A(local-mgmt)# show tech-support chassis 3 iom 2 brief | no-more | egrep  "lostarb|fixup"

        lostarbitration 17177

        fixup 24421

        fixup 16

        lostarbitration 68384

        fixup 84929

        fixup 1

        lostarbitration 118395

        fixup 158836

        fixup 130

        lostarbitration 472065

        fixup 586748

        fixup 1

My UCS Blog http://realworlducs.com

Hi Craig,

From the output above I can see the traces of you being effected by a know bug.

Bug: CSCtq10987    IOM I2C driver, noisy PSU bus spoils next non-PSU IO operation

Please go ahead and open a TAC case and we'll be able to help you from there.

The is however fixed in the 1.4.3L version of the code.

Another sugestion I'll have for you is to move from the 1.4.2b code to the latest 1.4.3 code, since there is a known major bug in 1.4.2b.

Thnaks!

./Abhinav

Review Cisco Networking products for a $25 gift card