07-30-2018 06:29 AM - edited 03-08-2019 03:46 PM
Hello.
Can somebody help me please!?
ROM: System Bootstrap, Version 12.2(17r)S4, RELEASE SOFTWARE (fc1)
BOOTLDR: Cisco IOS Software, c7600s72033_rp Software (c7600s72033_rp-ADVIPSERVICESK9-M), Version 15.1(2)S, RELEASE SOFTWARE (fc1)
METROAGG1 uptime is 3 hours, 23 minutes
Uptime for this control processor is 3 hours, 23 minutes
System returned to ROM by s/w reset at 22:07:14 UTC Mon Feb 27 2012 (SP by bus error at PC 0x4048B1F4, address 0x0)
System restarted at 13:03:39 EEDT Mon Jul 30 2018
System image file is "disk0:c7600s72033-advipservicesk9-mz.151-2.S.bin"
Last reload type: Normal Reload
cisco CISCO7609-S (R7000) processor (revision 1.0) with 983008K/65536K bytes of memory.
Processor board ID FOX1343GPS9
SR71000 CPU at 600MHz, Implementation 1284, Rev 1.2, 512KB L2 Cache
Last reset from s/w reset
2 Virtual Ethernet interfaces
242 Gigabit Ethernet interfaces
8 Ten Gigabit Ethernet interfaces
1917K bytes of non-volatile configuration memory.
8192K bytes of packet buffer memory.
I have two reboot with reason (sup was rsp720-3c):
1.EEDT: %C7600_PLATFORM-2-PEER_RESET: RP is being reset by the SP %Software-forced reload
2.EEDT: %CPU_MONITOR-3-PEER_EXCEPTION: CPU_MONITOR peer has failed due to exception , reset by [5/0]
%Software-forced reload
Then I change the supervisor to sup720-3bxl and it reboot again with reason:
3.EEDT: %SYS-SP-6-MEMDUMP: 0x80839F0: 0x1 0x0 0x1000001 0x517415F8
%Software-forced reload
07-30-2018 07:45 AM
RP (routing Processor) is being reset by the SP (switch processor).
Please, add crash info to better understand the reason.
Regards.
07-30-2018 07:49 AM - edited 07-30-2018 08:09 AM
Hello!
_20180728-121918 - %CPU_MONITOR-3-PEER_EXCEPTION
20180728-000112 - %C7600_PLATFORM-2-PEER_RESET
After replacing the supervisor and repeating the problem, I only have a couple of variants:
- slot?
- chassis?
Thanks.
07-30-2018 08:10 AM
Hi, in the first file I see
1700511: Jul 27 07:22:07.624 EEDT: %MAC_MOVE-SP-4-NOTIF: Host e446.da52.4b3c in vlan 172 is flapping between port Gi9/47 and port Gi9/31
1700512: Jul 27 08:50:48.055 EEDT: %MAC_MOVE-SP-4-NOTIF: Host 901b.0eed.0383 in vlan 124 is flapping between port Gi9/27 and port Gi9/32
1700513: Jul 28 00:01:12.411 EEDT: %C7600_PLATFORM-2-PEER_RESET: RP is being reset by the SP
The MAC FLAPPING is tipically caused by a loop in the network.
In the second file I see
000093: Jul 28 12:19:18.690 EEDT: %CPU_MONITOR-3-PEER_EXCEPTION: CPU_MONITOR peer has failed due to exception , reset by [5/0]
A similar behaviour is described in the bug CSCti22719 but seems related to traffic pattern.
Try to investigate for loops and try to test a different IOS.
Regards.
07-30-2018 03:41 PM
I am fairly certain this crash is due to a memory leak, CSCtw80533. (As of 31 July 2018, there are >205 TAC Cases, so this bug is very, very well known.)
The chassis is using a very, very old code: 15.2(1)S. This is version "0" (no number after the letter "S").
1700512: Jul 27 08:50:48.055 EEDT: %MAC_MOVE-SP-4-NOTIF: Host 901b.0eed.0383 in vlan 124 is flapping between port Gi9/27 and port Gi9/32 1700513: Jul 28 00:01:12.411 EEDT: %C7600_PLATFORM-2-PEER_RESET: RP is being reset by the SP
Look at the time and date. It looks like a silent leak. No log entries.
My recommendation is to upgrade the firmware of the chassis to something more recent.
07-30-2018 10:29 PM
08-13-2018 10:31 PM
Hello!
I updated the software to c7600rsp72043-adventerprisek9-mz.155-3.S4.bin, then worked about 2 weeks and rebooted again.
016252: Aug 13 22:29:46.192 EEDT: %PFREDUN-SP-6-ACTIVE: Standby processor removed or reloaded, changing to Simplex mode
016253: Aug 13 22:29:46.192 EEDT: %OIR-SP-3-PWRCYCLE: Card in module 6, is being power-cycled (Module reset)
016254: Aug 13 22:29:48.344 EEDT: %SNMP-5-MODULETRAP: Module 6 [Down] Trap
016255: Aug 13 22:30:23.404 EEDT: %C7600_PLATFORM-2-PEER_RESET: RP is being reset by the SP
%Software-forced reload
Any ideas?
08-13-2018 11:40 PM
08-13-2018 11:53 PM
08-14-2018 02:44 AM
000364: *Aug 13 23:44:00.715 EEDT: %CONST_DIAG-SP-3-HM_TEST_FAIL: Module 6 TestSPRPInbandPing consecutive failure count:26
000365: *Aug 13 23:44:00.715 EEDT: %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=8% RP=2% Traffic=1%
netint_thr_active[0], Tx_Rate[2301], Rx_Rate[6688], dev=1[IPv4, fail=1], 2[IPv4, fail=10], 3[IPv4, fail=21]
000366: *Aug 13 23:44:00.715 EEDT: %CONST_DIAG-SP-4-HM_TEST_WARNING: Sup switchover will occur after 10 consecutive failures
Raise a TAC Case. Whatever is Module 6 this is causing the supervisor card to reload..
08-14-2018 03:20 AM
08-14-2018 03:26 AM
08-22-2018 11:48 PM - edited 08-22-2018 11:51 PM
Hello!
I eject module 6, the system worked for 10 days, and again rebooted :((((((
Aug 23 09:05:39.193 EEDT: %SYS-SP-3-OVERRUN: Block overrun at 3C06A7D0 (red zone 04657188)
-Traceback= 81E8CD0z 83DC5D0z 83DD5D0z 83DD900z 83DDAE0z 840F648z 840A854z
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MTRACE: mallocfree: addr, pc
67E7F60,83E5554 67E7F60,83E55D8 67E7F60,30000084 67E7F60,83E5554
4403F00,6000001E 4403E10,83E5554 4403E10,83E55D8 4403E10,40000060
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MTRACE: mallocfree: addr, pc
67E7F60,83E55D8 67E7F60,30000084 67E7F60,83E5554 4403F00,6000001E
4403E10,83E5554 4403E10,83E55D8 4403E10,40000060 67E7F60,83E55D8
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-BLKINFO: Corrupted redzone blk 3C06A7D0, words 136, alloc 824C83C, InUse, dealloc 0, rfcnt 1
-Traceback= 81E8CD0z 83B797Cz 83DC5E8z 83DD5D0z 83DD900z 83DDAE0z 840F648z 840A854z
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MEMDUMP: 0x3C06A7D0: 0xAB1234CD 0xFFFE0000 0x0 0xA3C94DC
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MEMDUMP: 0x3C06A7E0: 0x824C83C 0x3C06A910 0x3C06A6A4 0x80000088
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MEMDUMP: 0x3C06A7F0: 0x1 0xFFCA11F6 0x1000001 0x113A1654
%Software-forced reload
Aug 23 09:05:39.225 EEDT: %DIAG-SP-3-NO_DIAG_RUNNING: Module 5: Diagnostic is not running
09:05:39 EEDT Thu Aug 23 2018: Unexpected exception to CPU: vector 1500, PC = 0x840EEAC , LR = 0x840EE44
-Traceback= 0x840EEACz 0x840EE44z 0x83DD5D0z 0x83DD900z 0x83DDAE0z 0x840F648z 0x840A854z
Can it still be the fault of the chassis?
Unfortunately we do not have Cisco TAC support.
08-23-2018 12:23 AM
Hello.
I eject module 6, the system worked for 10 days and rebooted again.
Aug 23 09:05:39.193 EEDT: %SYS-SP-3-OVERRUN: Block overrun at 3C06A7D0 (red zone 04657188)
-Traceback= 81E8CD0z 83DC5D0z 83DD5D0z 83DD900z 83DDAE0z 840F648z 840A854z
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MTRACE: mallocfree: addr, pc
67E7F60,83E5554 67E7F60,83E55D8 67E7F60,30000084 67E7F60,83E5554
4403F00,6000001E 4403E10,83E5554 4403E10,83E55D8 4403E10,40000060
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MTRACE: mallocfree: addr, pc
67E7F60,83E55D8 67E7F60,30000084 67E7F60,83E5554 4403F00,6000001E
4403E10,83E5554 4403E10,83E55D8 4403E10,40000060 67E7F60,83E55D8
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-BLKINFO: Corrupted redzone blk 3C06A7D0, words 136, alloc 824C83C, InUse, dealloc 0, rfcnt 1
-Traceback= 81E8CD0z 83B797Cz 83DC5E8z 83DD5D0z 83DD900z 83DDAE0z 840F648z 840A854z
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MEMDUMP: 0x3C06A7D0: 0xAB1234CD 0xFFFE0000 0x0 0xA3C94DC
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MEMDUMP: 0x3C06A7E0: 0x824C83C 0x3C06A910 0x3C06A6A4 0x80000088
Aug 23 09:05:39.193 EEDT: %SYS-SP-6-MEMDUMP: 0x3C06A7F0: 0x1 0xFFCA11F6 0x1000001 0x113A1654
%Software-forced reload
Aug 23 09:05:39.225 EEDT: %DIAG-SP-3-NO_DIAG_RUNNING: Module 5: Diagnostic is not running
09:05:39 EEDT Thu Aug 23 2018: Unexpected exception to CPU: vector 1500, PC = 0x840EEAC , LR = 0x840EE44
-Traceback= 0x840EEACz 0x840EE44z 0x83DD5D0z 0x83DD900z 0x83DDAE0z 0x840F648z 0x840A854z
Can the problem still be in the chassis?
Unfortunately, we do not have cisco TAC support.
08-23-2018 12:54 AM
@aleks wrote:
%Software-forced reload
Looks like a software bug.
Eject whatever is in slot 6 and leave it out. Let's see if there are more any issues for the next, say, 30 days.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide