cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6988
Views
10
Helpful
8
Replies

I2C Errors after 2.2(6c)

Ryan Maclachlan
Level 1
Level 1

It appears CSCue49366 has reared it's ugly head again as after updating from a functional 2.2(3e) to 2.2(6c) we are fighting with the symptoms outlined in that bug report (fans @ 100% or not functional, no PSU readings). After working with TAC for a week it's been decided the last thing to try is a full chassis shutdown.

 

Hopefully it will come up without any errors, heads up for anyone looking to update.

1 Accepted Solution

Accepted Solutions

Ryan,

I believe that it is expected for the subordinate to not pull that fan information (instead we rely on the current primary), thus the 'UNKNOWN' status you see.

View solution in original post

8 Replies 8

Niko Nikas
Cisco Employee
Cisco Employee

We would be able to quickly tell if it's an I2C congestion issue from your i2c.log file.

Could you supply the segment from that log below # I2C Bus 1 and # I2C Bus 2? (both IOMs will have different values)

 

I have seen some issues that looked similar but weren't necessarily due to I2C congestion.

Thanks for the reply

Here is IOM 1 (Currently Primary)

# I2C Bus Statistics Wed Oct 21 14:48:41 CST 2015
# I2C Bus 1
busn=0 nseg=2
segment 0 local
        wait_gt_deadline 1
segment 1 extended
        wait_gt_deadline 13
error_pca9541_per_device:
# I2C Bus 2
busn=1 nseg=5
segment 0 local
segment 1 chassis
        norxack 192
        wait_gt_deadline 5
segment 2 blade
segment 3 fan
segment 4 psu
        timeout 1777
        fixup 3490
        hub_sw_mbb 1713
        hub_sw_mbb_to 1713
gilroy.counter.reserved 1592
gilroy.counter.released 1592
gilroy.counter.status_chassis_off 3725
error_pca9541_per_device:

# I2C Device Statistics
iom.fru={SUCCESS=12}
iom.rtc={SUCCESS=7}
NAME=iom.woodside
No such path /sys/devices/platform/fsl-i2c.1/i2c-0/0-0048
NAME=iom.dcdc0
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
NAME=iom.dcdc1
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
iom.gpio0={SUCCESS=154}
iom.gpio1={SUCCESS=154}
iom.gpio2={SUCCESS=154}
iom.gpio3={SUCCESS=372}
iom.temp.inlet1={SUCCESS=19}
iom.temp.inlet2={SUCCESS=19}
iom.temp.woodside={SUCCESS=17}
c.fru={SUCCESS=1}
c.seeprom={SUCCESS=1523}
f.fm0.fru={SUCCESS=1}
f.fm1.fru={SUCCESS=1}
f.fm2.fru={SUCCESS=1}
f.fm3.fru={SUCCESS=1}
f.fm4.fru={SUCCESS=1}
f.fm5.fru={SUCCESS=1}
f.fm6.fru={SUCCESS=1}
f.fm7.fru={SUCCESS=1}
p.psu3.psmi={EBUSY=4}
p.psu0.fru={ETIMEDOUT=73}
p.psu0.psmi={EBUSY=4}
p.psu1.fru={ETIMEDOUT=73}
p.psu1.psmi={EBUSY=4}
p.psu2.fru={ETIMEDOUT=72}
p.psu2.psmi={EBUSY=4}
p.psu3.fru={ETIMEDOUT=72}
c.gpio0={ENXIO=8}
c.gpio1={ENXIO=8}
c.gpio2={ENXIO=8}
c.gpio3={ENXIO=8}
# I2C Driver sysctl entries
sysctl: error: permission denied on key 'net.ipv4.route.flush'
sysctl: error: permission denied on key 'kernel.cad_pid'
sysctl: error: permission denied on key 'kernel.cap-bound'
dev.i2c.disconnect_retry = 3
dev.i2c.post_trigger = 64
dev.i2c.norxack_blink = 5
dev.i2c.norxack_blink = 5
dev.i2c.fixup_blink = 0
dev.i2c.pca9541-workaround = 18
dev.i2c.wait_deadline = 30
dev.i2c.chassis_reservation.demand = 1
dev.i2c.chassis_reservation.lock_state = 1
dev.i2c.chassis_reservation.auto_release = 1
dev.i2c.chassis_reservation.on_demand = 0
dev.i2c.chassis_reservation.pause_gilroy_thread = 0
dev.i2c.chassis_reservation.min_notheld_ms = 150
dev.i2c.chassis_reservation.wait_extra_ms = 3000
dev.i2c.chassis_reservation.grace_ms = 750
dev.i2c.chassis_reservation.hold_ms = 1500
dev.i2c.chassis_reservation.wait_ms = 4500
dev.i2c.gilroy-debug-level = 3
dev.i2c.debug-level = 1
dev.i2c.pca9541-businit = 1
dev.i2c.pca9541-delay = 250
dev.i2c.bus2.write-cdelay = 100
dev.i2c.bus2.write-delay = 100
dev.i2c.bus1.write-cdelay = 30
dev.i2c.bus1.write-delay = 30

And IOM 2 (Subordinate)

# I2C Bus Statistics Wed Oct 21 14:49:16 CST 2015
# I2C Bus 1
busn=0 nseg=2
segment 0 local
segment 1 extended
        wait_gt_deadline 21
error_pca9541_per_device:
# I2C Bus 2
busn=1 nseg=5
segment 0 local
segment 1 chassis
        norxack 201
        wait_gt_deadline 19
segment 2 blade
segment 3 fan
        norxack 34
        pca9541clrerrprs 27
        pca9541seterr 5
        pca9541clrlasterr 1
        wait_gt_deadline 14
segment 4 psu
        timeout 10993
        fixup 17073
        hub_sw_mbb 6080
        hub_sw_mbb_to 6080
gilroy.error.pca9541_control_state 2
gilroy.error.do_reserve_pca9541_control 2
gilroy.counter.reserve 1
gilroy.counter.release 1
gilroy.counter.reserved 3670
gilroy.counter.already_reserved 1
gilroy.counter.released 3665
gilroy.counter.already_released 5
gilroy.counter.status_chassis_off 5723
error_pca9541_per_device:
                c.ms 2
                f.fm0.ms 3
                f.fm2.fru 5
                f.fm2.ms 18
                f.fm7.ms 6

# I2C Device Statistics
iom.fru={SUCCESS=12}
iom.rtc={SUCCESS=7}
NAME=iom.woodside
No such path /sys/devices/platform/fsl-i2c.1/i2c-0/0-0048
NAME=iom.dcdc0
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
NAME=iom.dcdc1
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
iom.gpio0={SUCCESS=780}
iom.gpio1={SUCCESS=780}
iom.gpio2={SUCCESS=780}
iom.gpio3={SUCCESS=1624}
iom.temp.inlet1={SUCCESS=291}
iom.temp.inlet2={SUCCESS=291}
iom.temp.woodside={SUCCESS=255}
c.fru={SUCCESS=1}
c.seeprom={SUCCESS=1309}
f.fm0.fc={SUCCESS=452}
f.fm0.fru={SUCCESS=8}
f.fm1.fc={SUCCESS=452}
f.fm1.fru={SUCCESS=6}
f.fm2.fc={SUCCESS=449}
f.fm2.fru={SUCCESS=18,ETIMEDOUT=1}
f.fm3.fc={SUCCESS=449}
f.fm3.fru={SUCCESS=6}
f.fm4.fc={SUCCESS=449,EBUSY=2}
f.fm4.fru={SUCCESS=3}
f.fm5.fc={SUCCESS=449}
f.fm5.fru={SUCCESS=3}
f.fm6.fc={SUCCESS=447}
f.fm6.fru={SUCCESS=11}
f.fm7.fc={SUCCESS=447}
f.fm7.fru={SUCCESS=3}
p.psu3.psmi={EBUSY=424}
p.psu0.fru={ETIMEDOUT=386}
p.psu0.psmi={EBUSY=441}
p.psu1.fru={ETIMEDOUT=386}
p.psu1.psmi={EBUSY=424}
p.psu2.fru={ETIMEDOUT=386}
p.psu2.psmi={EBUSY=424}
p.psu3.fru={ETIMEDOUT=386}
c.gpio0={ENXIO=8}
c.gpio1={ENXIO=8}
c.gpio2={ENXIO=8}
c.gpio3={ENXIO=8}
# I2C Driver sysctl entries
sysctl: error: permission denied on key 'net.ipv4.route.flush'
sysctl: error: permission denied on key 'kernel.cad_pid'
sysctl: error: permission denied on key 'kernel.cap-bound'
dev.i2c.disconnect_retry = 3
dev.i2c.post_trigger = 64
dev.i2c.norxack_blink = 5
dev.i2c.norxack_blink = 5
dev.i2c.fixup_blink = 0
dev.i2c.pca9541-workaround = 18
dev.i2c.wait_deadline = 30
dev.i2c.chassis_reservation.demand = 1
dev.i2c.chassis_reservation.lock_state = 1
dev.i2c.chassis_reservation.auto_release = 1
dev.i2c.chassis_reservation.on_demand = 0
dev.i2c.chassis_reservation.pause_gilroy_thread = 0
dev.i2c.chassis_reservation.min_notheld_ms = 150
dev.i2c.chassis_reservation.wait_extra_ms = 3000
dev.i2c.chassis_reservation.grace_ms = 750
dev.i2c.chassis_reservation.hold_ms = 1500
dev.i2c.chassis_reservation.wait_ms = 4500
dev.i2c.gilroy-debug-level = 3
dev.i2c.debug-level = 1
dev.i2c.pca9541-businit = 1
dev.i2c.pca9541-delay = 250
dev.i2c.bus2.write-cdelay = 100
dev.i2c.bus2.write-delay = 100
dev.i2c.bus1.write-cdelay = 30
dev.i2c.bus1.write-delay = 30

OBFL Logs are showing seeprom errors on IOM 1:

2015-10-21T20:44:13.910978+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_header_info:header read error#012
2015-10-21T20:44:18.367276+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_seeprom_data:seeprom read error: -11 (Resource temporarily unavailable)#012
2015-10-21T20:44:18.367458+00:00 CMC NOCSN_-3-CMC  OBFL:0:mcserver_seeprom_read_data:seeprom read error#012
2015-10-21T20:44:22.677589+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_seeprom_data:seeprom read error: -11 (Resource temporarily unavailable)#012
2015-10-21T20:44:22.677696+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_header_info:header read error#012
2015-10-21T20:44:30.865812+00:00 CMC NOCSN_-3-CMC  OBFL:0:lazy_dmclient_init:Error initializing the dmclient#012
2015-10-21T20:44:30.866799+00:00 CMC NOCSN_-3-CMC  OBFL:0:seeprom_extension_init:lazy_dmclient_init error during seeprom_extension_init: -519#012
2015-10-21T20:44:30.868088+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_seeprom_data:seeprom read error: -501 (Initializatoin failed)#012
2015-10-21T20:44:30.868590+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_header_info:header read error#012
2015-10-21T20:44:31.972263+00:00 CMC NOCSN_-3-CMC  OBFL:0:read_seeprom_data:seeprom read error: -11 (Resource temporarily unavailable)#012
2015-10-21T20:44:31.972457+00:00 CMC NOCSN_-3-CMC  OBFL:0:mcserver_seeprom_read_data:seeprom read error#012

IOM 2 is showing bad PSU readings, all PSUs show 'online' but the thermal and statistics, name and serial numbers are all blank:

2015-10-21T20:34:20.719265+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:1429:scan_power_supplies:Restarting ps polling after 60, seconds.
2015-10-21T20:35:23.139424+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:ps 0 marked bad, sys mask: 1, psu_mask f#012
2015-10-21T20:36:44.364090+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:ps 1 marked bad, sys mask: 3, psu_mask f#012
2015-10-21T20:37:51.385312+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:ps 2 marked bad, sys mask: 7, psu_mask f#012
2015-10-21T20:39:07.027963+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:ps 3 marked bad, sys mask: f, psu_mask f#012
2015-10-21T20:39:07.028077+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:All PS marked bad, attempting to reset through other IOM#012
2015-10-21T20:39:23.479998+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-004b: read 128 bytes offset 0 to 4b, err -16
2015-10-21T20:39:47.126256+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-004f: read 128 bytes offset 0 to 4f, err -16
2015-10-21T20:40:00.182760+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-0056: read 128 bytes offset 0 to 56, err -16
2015-10-21T20:40:07.721113+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-005d: read 128 bytes offset 0 to 5d, err -16
2015-10-21T20:40:21.576658+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-004b: read 128 bytes offset 0 to 4b, err -16
2015-10-21T20:40:43.179415+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-004f: read 128 bytes offset 0 to 4f, err -16
2015-10-21T20:40:59.809516+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-0056: read 128 bytes offset 0 to 56, err -16
2015-10-21T20:41:05.899218+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-005d: read 128 bytes offset 0 to 5d, err -16
2015-10-21T20:41:21.434477+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-004b: read 128 bytes offset 0 to 4b, err -16
2015-10-21T20:41:40.027845+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-004f: read 128 bytes offset 0 to 4f, err -16
2015-10-21T20:42:00.060439+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-0056: read 128 bytes offset 0 to 56, err -16
2015-10-21T20:42:06.125467+00:00 CMC NOCSN_kernel-3-CMC  OBFL: at24 1-005d: read 128 bytes offset 0 to 5d, err -16
2015-10-21T20:42:06.126910+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:1429:scan_power_supplies:Restarting ps polling after 60, seconds.
2015-10-21T20:43:16.547468+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:ps 0 marked bad, sys mask: 1, psu_mask f#012
2015-10-21T20:44:27.687322+00:00 CMC NOCSN_dmserver-3-CMC  OBFL:0:scan_power_supplies:ps 1 marked bad, sys mask: 3, psu_mask f#012

 

And IOM 2 is able to see fans status where as IOM 1 is showing Unknown

 

IOM2 logs:

interval:        15                # seconds
write_ts:        1445460326        # Wed Oct 21 14:45:26 2015
stale_ts:        1445460348        # Wed Oct 21 14:45:48 2015 STALE
now:                1445460364        # Wed Oct 21 14:46:04 2015
status:                1                # ACTIVE
policy_state:        1                # COOL
xreading:        1                # DEVELOPER_MODE: FALSE
hwconf_valid:        1
maxfans:                8
fan[1].fault/read/req:        0/30/30        # OK
fan[2].fault/read/req:        0/30/30        # OK
fan[3].fault/read/req:        0/30/30        # OK
fan[4].fault/read/req:        0/30/30        # OK
fan[5].fault/read/req:        0/30/30        # OK
fan[6].fault/read/req:        0/30/30        # OK
fan[7].fault/read/req:        0/30/30        # OK
fan[8].fault/read/req:        0/30/30        # OK

 

I have re-seated both IOMs in the last day but that hasn't cleared up the errors.

 

 

Ryan,

So the norxack values are fairly low, not high enough for me to be too concerned.

The only thing I take issue with in that I2C log would be some of those timeout values, particularly IOM 2's PSU segment.

 

Does IOM1's thermal.log file show something similar to the below?

fan[1].fault/read/req:    3/0/30    # UNKNOWN
fan[2].fault/read/req:    3/0/30    # UNKNOWN
fan[3].fault/read/req:    3/0/30    # UNKNOWN
fan[4].fault/read/req:    3/0/30    # UNKNOWN
fan[5].fault/read/req:    3/0/30    # UNKNOWN
fan[6].fault/read/req:    3/0/30    # UNKNOWN
fan[7].fault/read/req:    3/0/30    # UNKNOWN
fan[8].fault/read/req:    3/0/30    # UNKNOWN

 

Yep exactly. IOM 1 was reseated about 5 hours ago, maybe that reset the counters on norxack, not sure.

 

valid:                1
pid:                1335
interval:        15                # seconds
write_ts:        1445461350        # Wed Oct 21 15:02:30 2015
stale_ts:        1445461372        # Wed Oct 21 15:02:52 2015 OK
now:                1445461360        # Wed Oct 21 15:02:40 2015
status:                2                # PASSIVE
policy_state:        1                # COOL
xreading:        1                # DEVELOPER_MODE: FALSE
hwconf_valid:        1
maxfans:                8
fan[1].fault/read/req:        3/0/30        # UNKNOWN
fan[2].fault/read/req:        3/0/30        # UNKNOWN
fan[3].fault/read/req:        3/0/30        # UNKNOWN
fan[4].fault/read/req:        3/0/30        # UNKNOWN
fan[5].fault/read/req:        3/0/30        # UNKNOWN
fan[6].fault/read/req:        3/0/30        # UNKNOWN
fan[7].fault/read/req:        3/0/30        # UNKNOWN
fan[8].fault/read/req:        3/0/30        # UNKNOWN

 

After re-seating IOM2 last night I saw it come up after 20-25 mins as expected but HA was in a not-read state with 'Chassis Configuration Incomplete'. Last time I saw that I simply re-seated a fan module and it came up, so I did that this morning and instead of it finishing, all but fan #1 and #2 turned off. The servers were heating up so I tried re-seating IOM1 and luckily that brought the fans up, right now they are spinning as they should (not 100%).

This is our only chassis so while I do have planned downtime this evening I'm trying to keep it going until then.

 

Appears as though the full chassis power cycle fixed the errors.

Ryan,

Good to hear. Presumably that bus is looking slightly cleaner now?

I don't think that was the cause of the issue you were seeing, but it would be difficult to get more details on this one outside of an actual case. It sounds like you have one open, so hopefully you can get some more info from your engineer.

Yes, I sent in a full Tech Support Log dump this morning so hopefully hear back from them. At least now I can see all the stats for PSU/fans. No more EBUSY signals on the i2c logs either. I do have some norxack but overall much lower.

 

# I2C Bus Statistics Thu Oct 22 09:34:16 CST 2015
# I2C Bus 1
busn=0 nseg=2
segment 0 local
        wait_gt_deadline 1
segment 1 extended
        wait_gt_deadline 2
error_pca9541_per_device:
# I2C Bus 2
busn=1 nseg=5
segment 0 local
segment 1 chassis
        norxack 192
        wait_gt_deadline 3
segment 2 blade
segment 3 fan
        norxack 3
        wait_gt_deadline 247
segment 4 psu
        unfinished 3
        fixup 3
        pca9541clrerrprs 2
        pca9541postio3 1
        wait_gt_deadline 104

 

# I2C Bus Statistics Thu Oct 22 09:36:01 CST 2015
# I2C Bus 1
busn=0 nseg=2
segment 0 local
segment 1 extended
        wait_gt_deadline 2
error_pca9541_per_device:
# I2C Bus 2
busn=1 nseg=5
segment 0 local
segment 1 chassis
        norxack 198
        wait_gt_deadline 1
segment 2 blade
segment 3 fan
segment 4 psu
        wait_gt_deadline 1

The only thing I'm curious of that maybe you can help answer, on the subordinate IOM I still see Unknown on thermal status, the primary IOM is looking fine though:

 

magic:                0x486f7403        # OK
valid:                1
pid:                1637
interval:        15                # seconds
write_ts:        1445528182        # Thu Oct 22 09:36:22 2015
stale_ts:        1445528204        # Thu Oct 22 09:36:44 2015 OK
now:                1445528187        # Thu Oct 22 09:36:27 2015
status:                2                # PASSIVE
policy_state:        1                # COOL
xreading:        1                # DEVELOPER_MODE: FALSE
hwconf_valid:        1
maxfans:                8
fan[1].fault/read/req:        3/0/30        # UNKNOWN
fan[2].fault/read/req:        3/0/30        # UNKNOWN
fan[3].fault/read/req:        3/0/30        # UNKNOWN
fan[4].fault/read/req:        3/0/30        # UNKNOWN
fan[5].fault/read/req:        3/0/30        # UNKNOWN
fan[6].fault/read/req:        3/0/30        # UNKNOWN
fan[7].fault/read/req:        3/0/30        # UNKNOWN
fan[8].fault/read/req:        3/0/30        # UNKNOWN

 

-Edit- I guess I should put my question in, is that normal behavior for IOMs to see Ok on one and Unknown on the other

 

 

Ryan,

I believe that it is expected for the subordinate to not pull that fan information (instead we rely on the current primary), thus the 'UNKNOWN' status you see.

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card