12-28-2012 09:02 PM - edited 03-07-2019 10:49 AM
Looking for some support here. Got a long lingering, year long issue that has spanned about 8 supervisor cards and a complete chassis swap. The 6509 acts as a ITN in our facility. The active sup card at random points of pipe usage boots into rommon mode, seriously inhibiting our company. I'm able to swap the 2 fiber pairs that we had going into the active supervisor card into the secondary and usually this works for another random amount of time, however, today it occured within minutes of hooking up the fiber links. Sitting there for about 5 minutes it booted into ROMMON. When this happens, I'm able to boot the sup card back to good status. Previous remedial actions, other than replacing sup cards/chassis, was checking the config register and making sure it was x2102. Previously, it was not, and we corrected and reloaded and it took, we thought this would fix the problem, until today. Please let me know what you see in these log files. Thank you so much.
The point at which I believe the first supervisor card went into ROMMON:
--More-- 6668: 12/28/12 15:27:58 5/-1: MAJ, GOLD, loopback_unused_port[1/43]: check_ether_packet failed
6669: 12/28/12 15:27:58 5/-1: MAJ, GOLD, check_ether_packet_core [1/43]: diag_hit_sys_limit returns TRUE! The current mod/port test may be skipped.
6670: 12/28/12 15:27:58 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [1/43]: fabric buffer rate low in lc_chan 9 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please reduc
6671: 12/28/12 15:27:58 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [1/43]: fabric buffer rate low in sup_chan 19 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please red
6672: 12/28/12 15:27:58 5/-1: MAJ, GOLD, DIAG_SYS_LIMIT (sp_last_busy_percent, tx_rate, rx_rate, sp_netint_thr) NOT HIT
6673: 12/28/12 15:27:58 5/-1: MAJ, GOLD, check_ether_packet_core[1/-1/43]: newpak is NULL!
6674: 12/28/12 15:27:57 5/-1: MAJ, GOLD, loopback_unused_port[1/43]: check_ether_packet failed
6675: 12/28/12 15:27:57 5/-1: MAJ, GOLD, check_ether_packet_core [1/43]: diag_hit_sys_limit returns TRUE! The current mod/port test may be skipped.
6676: 12/28/12 15:27:57 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [1/43]: fabric buffer rate low in lc_chan 9 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please reduc
--More-- 6677: 12/28/12 15:27:57 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [1/43]: fabric buffer rate low in sup_chan 19 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please red
6678: 12/28/12 15:27:57 5/-1: MAJ, GOLD, DIAG_SYS_LIMIT (sp_last_busy_percent, tx_rate, rx_rate, sp_netint_thr) NOT HIT
6679: 12/28/12 15:27:57 5/-1: MAJ, GOLD, check_ether_packet_core[1/-1/43]: newpak is NULL!
6680: 12/28/12 15:27:55 5/-1: MAJ, GOLD, diag_get_fabric_status[8]: diag_hit_sys_limit: Test skipped!
6681: 12/28/12 15:27:55 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [8/1]: fabric buffer rate low in lc_chan 16 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please reduc
6682: 12/28/12 15:27:55 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [8/1]: fabric buffer rate low in sup_chan 19 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please redu
6683: 12/28/12 15:27:55 5/-1: MAJ, GOLD, DIAG_SYS_LIMIT (sp_last_busy_percent, tx_rate, rx_rate, sp_netint_thr) NOT HIT
6684: 12/28/12 15:27:55 5/-1: MAJ, GOLD, loopback_internal_port[8/6]: check_bpdu_packet failed (err-code 0)
--More-- 6685: 12/28/12 15:27:55 5/-1: MAJ, GOLD, check_bpdu_packet_core[8/6]: diag_hit_sys_limit returns TRUE! The current mod/port test may be skipped.
6686: 12/28/12 15:27:55 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [8/6]: fabric buffer rate low in lc_chan 16 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please reduc
6687: 12/28/12 15:27:55 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [8/6]: fabric buffer rate low in sup_chan 19 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please redu
6688: 12/28/12 15:27:55 5/-1: MAJ, GOLD, DIAG_SYS_LIMIT (sp_last_busy_percent, tx_rate, rx_rate, sp_netint_thr) NOT HIT
6689: 12/28/12 15:27:55 5/-1: MAJ, GOLD, check_bpdu_packet_core[8/6]: newpak is NULL!
6690: 12/28/12 15:27:54 5/-1: MAJ, GOLD, loopback_internal_port[8/6]: check_bpdu_packet failed (err-code 0)
6691: 12/28/12 15:27:54 5/-1: MAJ, GOLD, check_bpdu_packet_core[8/6]: diag_hit_sys_limit returns TRUE! The current mod/port test may be skipped.
6692: 12/28/12 15:27:54 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [8/6]: fabric buffer rate low in lc_chan 16 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please reduc
6693: 12/28/12 15:27:54 5/-1: MAJ, GOLD, diag_check_fabchan_flow_control_status_ok [8/6]: fabric buffer rate low in sup_chan 19 with flow control status in_lo 1 in_hi 0 out_lo 0 out_hi 0! Flow control is occuring in the system. Please redu
--More-- 6694: 12/28/12 15:27:54 5/-1: MAJ, GOLD, check_bpdu_packet_core[8/6]: newpak is NULL!
6695: 12/28/12 15:09:39 5/-1: MAJ, GOLD, diag_get_port_tx_count_changed: Tx count is unchanged for mod=4 port=8 count=0 prev_count=0
6696: 12/28/12 15:09:39 5/-1: MAJ, GOLD, diag_get_port_tx_count_changed: Tx count is unchanged for mod=4 port=7 count=0 prev_count=0
6697: 12/28/12 15:09:39 5/-1: MAJ, GOLD, diag_get_port_tx_count_changed: Tx count is unchanged for mod=4 port=6 count=0 prev_count=0
6698: 12/28/12 15:09:39 5/-1: MAJ, GOLD, diag_get_port_tx_count_changed: Tx count is unchanged for mod=4 port=5 count=0 prev_count=0
6699: 12/28/12 15:09:39 5/-1: MAJ, GOLD, diag_get_port_tx_count_changed: Tx count is unchanged for mod=4 port=4 count=0 prev_count=0
6700: 12/28/12 15:09:39 5/-1: MAJ, GOLD, diag_get_port_tx_count_changed: Tx count is unchanged for mod=4 port=3 count=0 prev_count=0
12-28-2012 09:09 PM
Are there any crashinfo files? If you have, please post the latest three.
If you've done a complete chassis and supervisor swap then the only suspect left is the IOS.
12-28-2012 10:19 PM
Going to be a couple days until I get the crashinfo, but I do have some info from another log.
NVRAM log:
01. 10/17/2012 15:47:15: pf_redun_negotiate:Initialized as ACTIVE processor
02. 10/17/2012 15:54:34: pf_redun_negotiate:Initialized as ACTIVE processor
03. 10/17/2012 15:57:40: oir_reload_chassis:Reload Chassis requested
04. 10/17/2012 16:00:08: pf_redun_negotiate:Initialized as ACTIVE processor
05. 10/27/2012 03:10:23: pf_redun_negotiate:Initialized as ACTIVE processor
06. 10/27/2012 06:17:05: pf_redun_announce_standby_init:Initialized as STANDBY processor
07. 10/27/2012 06:41:00: pf_redun_announce_standby_init:Initialized as STANDBY processor
08. 12/03/2012 00:06:52: pf_redun_track_switchover:STANDBY Supervisor switching-over to ACTIVE
09. 12/03/2012 00:06:52: oir_reload_peer:Card in slot 6, is being power-cycled
10. 12/03/2012 00:10:57: oir_reload_peer:Card in slot 6, is being power-cycled
11. 12/04/2012 11:56:22: oir_reload_peer:Card in slot 6, is being power-cycled
12. 12/15/2012 11:02:38: oir_cycle_bay:Card in module 7, is being power-cycled - off (Diagnostic Failure) (22)
13. 12/15/2012 11:06:19: oir_cycle_bay:Card in module 7, is being power-cycled - off (Diagnostic Failure) (22)
14. 12/20/2012 21:17:40: oir_reload_chassis:Reload Chassis requested
15. 12/21/2012 05:05:18: pf_redun_negotiate:Initialized as ACTIVE processor
16. 12/21/2012 00:28:08: oir_cycle_bay:Card in module 7, is being power-cycled - off (Diagnostic Failure) (22)
17. 12/21/2012 00:31:36: oir_cycle_bay:Card in module 7, is being power-cycled - off (Diagnostic Failure) (22)
18. 12/28/2012 15:48:17: nvlog_crash_hook:System crashed on slot 5. Check value of Rommon variable CRASHINFO
19. 12/29/2012 00:15:58: pf_redun_announce_standby_init:Initialized as STANDBY processor
20. 12/28/2012 17:03:11: pf_redun_track_switchover:STANDBY Supervisor switching-over to ACTIVE
21. 12/28/2012 17:03:11: oir_reload_peer:Card in slot 6, is being power-cycled
22. 12/28/2012 17:07:16: oir_reload_peer:Card in slot 6, is being power-cycled
23. 12/28/2012 17:13:36: fabric_error_handle:recurring errors: 1(ch1,side 6), 4(ch2,side 4) for this slot
Version Info:
------------------ show version ------------------
Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(33)SXI7, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2011 by Cisco Systems, Inc.
Compiled Mon 18-Jul-11 05:49 by prod_rel_team
ROM: System Bootstrap, Version 12.2(17r)SX7, RELEASE SOFTWARE (fc1)
ITN0243 uptime is 1 week, 21 hours, 8 minutes
Uptime for this control processor is 1 hour, 57 minutes
Time since ITN0243 switched to active is 1 hour, 24 minutes
System returned to ROM by Stateful Switchover at 15:48:23 PST Fri Dec 28 2012 (SP by bus error at PC 0x417CA66C, address 0x0)
System restarted at 16:31:40 PST Fri Dec 28 2012
System image file is "sup-bootdisk:s72033-advipservicesk9_wan-mz.122-33.SXI7.bin"
Last reload reason: reload
12-28-2012 10:27 PM
063941: Dec 28 15:38:48.641 PST: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 5 TestSPRPInbandPing consecutive failure count:5
063942: Dec 28 15:38:48.641 PST: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=42% RP=5% Traffic=0%
netint_thr_active[0], Tx_Rate[33942], Rx_Rate[1276], dev=1[IPv4, fail=5], 2[IPv4, fail=5], 3[IPv4, fail=5], 4[IPv6, fail=5]
063943: Dec 28 15:38:48.641 PST: %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures
063944: Dec 28 15:40:32.841 PST: %SVCLC-5-FWTRUNK: Firewalled VLANs configured on trunks
063945: Dec 28 15:41:40.946 PST: %PIM-5-DRCHG: DR change from neighbor 0.0.0.0 to 132.33.195.2 on interface Vlan195
063946: Dec 28 15:42:21.174 PST: %SVCLC-5-FWTRUNK: Firewalled VLANs configured on trunks
063947: Dec 28 15:43:21.019 PST: %PIM-5-DRCHG: DR change from neighbor 0.0.0.0 to 132.33.195.2 on interface Vlan195
063948: Dec 28 15:43:28.003 PST: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 5 TestSPRPInbandPing consecutive failure count:10
063949: Dec 28 15:43:28.011 PST: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=37% RP=7% Traffic=0%
netint_thr_active[0], Tx_Rate[36545], Rx_Rate[6618], dev=1[IPv4, fail=10], 2[IPv4, fail=10], 3[IPv4, fail=10], 4[IPv6, fail=10]
063950: Dec 28 15:43:28.011 PST: %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures
Your crashinfo file attached reads like a sad country western song. Is this a FWSM/ASA? It's causing issues with your entire chassis.
I've got a strange feeling you have a backplane issue.
12-28-2012 10:43 PM
Little digging, FWSM is the firewall service module on one of the blades in the 6509.
12-29-2012 01:16 AM
12. 12/15/2012 11:02:38: oir_cycle_bay:Card in module 7, is being power-cycled - off (Diagnostic Failure) (22)
13. 12/15/2012 11:06:19: oir_cycle_bay:Card in module 7, is being power-cycled - off (Diagnostic Failure) (22)
And this looks like something is really happening to the entire chassis' backplane.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide