08-26-2015 10:21 PM - edited 03-05-2019 02:10 AM
Hi
We are a small internet provider company and we are running two (2) identical cisco 7513 for our bgp connections with different providers on each unit, apparently this problem, router crashes and reload, occurred almost simultaneously which started around 2 weeks ago and recurs on a daily basis which occurs twice a day on both units. The units has been operating in years with the same configurations and settings and no incident like this has been experienced until last 2 weeks.
this is the current config based on a sh ver:
Cisco Internetwork Operating System Software
IOS (tm) RSP Software (RSP-ISV-M), Version 12.3(1), RELEASE SOFTWARE (fc3)
Copyright (c) 1986-2003 by cisco Systems, Inc.
Compiled Thu 15-May-03 04:53 by dchih
Image text-base: 0x4001095C, data-base: 0x41DA2000
ROM: System Bootstrap, Version 11.1(8)CA1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)
BOOTLDR: GS Software (RSP-BOOT-M), Version 11.1(8)CA1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)
bgpr2 uptime is 26 minutes
System returned to ROM by bus error at PC 0x400F1578, address 0x62B38F16
System image file is "slot0:rsp-isv-mz.123-1.bin"
cisco RSP4 (R5000) processor with 262144K/2072K bytes of memory.
R5000 CPU at 200Mhz, Implementation 35, Rev 2.1, 512KB L2 Cache
Last reset from power-on
G.703/E1 software, Version 1.0.
G.703/JT2 software, Version 1.0.
X.25 software, Version 3.0.0.
Bridging software.
Chassis Interface.
5 VIP2 R5K controllers (6 FastEthernet)(16 Serial).
6 FastEthernet/IEEE 802.3 interface(s)
16 Serial network interface(s)
123K bytes of non-volatile configuration memory.
20480K bytes of Flash PCMCIA card at slot 0 (Sector size 128K).
8192K bytes of Flash internal SIMM (Sector size 256K).
Slave in slot 7 is running Cisco Internetwork Operating System Software
IOS (tm) RSP Software (RSP-DW-M), Version 12.3(1), RELEASE SOFTWARE (fc3)
Copyright (c) 1986-2003 by cisco Systems, Inc.
Compiled Thu 15-May-03 05:01 by dchih
Slave: Loaded from system
Slave: cisco RSP4 (R5000) processor with 262144K bytes of memory.
Configuration register is 0x2102
As an additional info when the problem started, it also started recording console logs of this kind, I'm not sure if this is an attack coming from a single ip block going to our whole network randomly:
*Aug 27 04:49:33.051: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.34(59134) -> 210.16.61.142(12345), 1 packet
*Aug 27 04:49:38.435: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.75(43036) -> 210.16.63.169(12345), 1 packet
*Aug 27 04:49:44.167: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.76(51418) -> 210.16.63.170(12345), 1 packet
*Aug 27 04:49:45.319: %SEC-6-IPACCESSLOGRL: access-list logging rate-limited or missed 1 packet
*Aug 27 04:50:03.439: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.147(45838) -> 210.16.61.6(12345), 1 packet
*Aug 27 04:50:17.959: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.74(52141) -> 210.16.14.13(12345), 1 packet
*Aug 27 04:50:43.327: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.230(53566) -> 210.16.0.18(12345), 1 packet
*Aug 27 04:50:47.135: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.118(36652) -> 210.16.0.155(12345), 1 packet
*Aug 27 04:50:51.015: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.141(50522) -> 210.16.62.242(12345), 1 packet
*Aug 27 04:50:55.831: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.32(58788) -> 210.16.20.178(12345), 1 packet
*Aug 27 04:51:14.351: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.213(44437) -> 210.16.61.72(12345), 1 packet
*Aug 27 04:51:41.171: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.29(38951) -> 210.16.61.137(12345), 1 packet
*Aug 27 04:51:56.979: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.27(45956) -> 210.16.62.128(12345), 1 packet
*Aug 27 04:52:00.779: %SEC-6-IPACCESSLOGP: list 130 denied tcp 64.125.239.165(52954) -> 210.16.5.167(12345), 1 packet
Though there are filters applied, i'm not sure if I still need our upstream provider to assist us in filtering the source ip block.
Your kind assistance is greatly appreciated, good day.
08-27-2015 07:57 AM
Are you still having crashes , did a crash file get generated to flash when it occurred, if its happening constantly you need to find out whether the hardware is gone or software but your saying its happening to 2 routers so I would say you triggered a bug, you could get the crash file give it to TAC to diagnose or just upgrade the software
System returned to ROM by bus error at PC 0x400F1578, address 0x62B38F16
08-28-2015 12:22 AM
Beforehand, thank you sir for your response.
anyway, yes sir we're still having frequent crashes, like for today between 5am until 320pm, we've already incurred 4 crashes.
as for the crashinfo, I was able to secure from one of the routers which i've attached.
Hoping to find an upgraded firmware for this model
08-28-2015 12:50 AM
Hi, ok can you post a command show stacks off the device please , I cant fully decipher a crash file I don't have the tools that TAC have if someone is on here from Cisco maybe they can, they have internal tools not available to the public which help with that but we should be able to see something in show stacks , if we cant identify it, it should be raised with TAC aswell if support in place if not you can try upgrade the software to avoid it your running an ED software which can be unstable there should be an MD release available , for now post the show stacks and well try identify if its hardware or software anyway, im still guessing software as its 2 routers simultaneously hit with the same issue
As well I see interrupts in the crash output so show stack will help identify that hopefully
08-28-2015 01:20 AM
Sir,
I've attached the sh stacks from both routers, the crashinfo that was sent came from the bgpr1
Last night I was able to setup a service router to replace bgpr1 and just transferred or pulled out the connections from bgpr1 to the service router and left the unit up and running. From the time I pulled the connections until this morning when I checked on the service router for its uptime for about 16 hours, bgpr1 has been up for the same duration, which I noticed, anyway just an added info
08-28-2015 03:40 AM
its software issue you need to upgrade to newer software, without TAC though I cant tell you exactly what image but i would just move up to newer software , these devices are EOL from 2012 so you wont get any support from TAC on it , the image your running is actually deferred on the website as well meaning too many bugs to keep in production , I see a 12.4 there I would go for that
ERROR: This router was last restarted by a bus error: 'bus error at PC 0x400F1578, address 0x62B38F16' The system encounters a bus error when the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem). TRY THIS: Issue a 'show region' and check if the address location (the 'address' part of the bus error - 0x62B38F16) falls within an existing address range. If the address reported by the bus error does not fall within the ranges displayed in the 'show region' output, this means that the router was trying to access an invalid address. This indicates that it is a Cisco IOS Software problem. Paste the output from the 'show stacks' command to decode the output and identify the Cisco IOS Software bug that is causing the bus error. If the address falls within one of the ranges in the 'show region' output, it means that the router was accessing a valid memory address, but the hardware corresponding to that address is not responding properly. This indicates a hardware problem. Try reseating the hardware belonging to this address range before attempting to replace it. Use the Troubleshooting Bus Error Crashes document for additional assistance.
08-28-2015 04:08 AM
I tried running the suggested debugging method with bgpr1 but no region output is generated, so I tried it instead with bgpr2 with these results:
System returned to ROM by bus error at PC 0x400F1578, address 0x6346C462
bgpr2#sh region
Region Manager:
Start End Size(b) Class Media Name
0x40000000 0x4FFFFFFF 268435456 Local R/W main
0x4001095C 0x41DA1999 31002686 IText R/O main:text
0x41DA2000 0x4285B41F 11244576 IData R/W main:data
0x4285B420 0x42B546DF 3117760 IBss R/W main:bss
0x42B546E0 0x42B746DF 131072 Local R/W main:fastheap
0x42B746E0 0x4FFFFFFF 222869792 Local R/W main:heap
0x80000000 0x87FFFFFF 134217728 Local R/W main:(main_k0)
0x88000000 0x88001FFF 8192 Iomem REG qa_k0
0x88002000 0x881FFFFF 2088960 Iomem R/W memd:(memd_k0)
0xA0000000 0xA7FFFFFF 134217728 Local R/W main:(main_k1)
0xA8000000 0xA8001FFF 8192 Iomem REG qa_k1
0xA8002000 0xA81FFFFF 2088960 Iomem R/W memd:(memd_k1)
0xE0000000 0xE0001FFF 8192 Iomem REG qa
0xE0002000 0xE01FFFFF 2088960 Iomem R/W memd
0xE8000000 0xE8001FFF 8192 Iomem REG qa:writethru
0xF0002000 0xF01FFFFF 2088960 Iomem R/W memd:(memd_bitswap)
0xF8002000 0xF81FFFFF 2088960 Iomem R/W memd:(memd_uncached)
bgpr2#
bgpr2#sh stacks
Minimum process stacks:
Free/Size Name
5672/6000 HPI Logger
5500/6000 Clock Server
11200/12000 Router Init
5128/12000 Init
10444/12000 Slave Server
5428/6000 RADIUS INITCONFIG
5684/6000 MDFS Reload
4944/6000 BGP Open
2460/3000 RSP memory size check
9720/12000 Virtual Exec
Interrupt level stacks:
Level Called Unused/Size Name
1 31908504 7972/9000 Network Interrupt
2 111444 7836/9000 Network Status Interrupt
3 0 8692/9000 OIR interrupt
4 0 9000/9000 PCMCIA Interrupt
5 7738 8644/9000 Console Uart
6 4 9000/9000 Error Interrupt
7 3393082 8604/9000 NMI Interrupt Handler
System was restarted by bus error at PC 0x400F1578, address 0x6346C462
RSP Software (RSP-ISV-M), Version 12.3(1), RELEASE SOFTWARE (fc3)
Compiled Thu 15-May-03 04:53 by dchih
Image text-base: 0x4001095C, data-base: 0x41DA2000
Stack trace from system failure:
FP: 0x4333FB60, RA: 0x400F1578
FP: 0x4333FB88, RA: 0x40692F1C
FP: 0x4333FBD0, RA: 0x4069EB34
FP: 0x4333FBF0, RA: 0x40687F94
FP: 0x4333FC40, RA: 0x40688C20
FP: 0x4333FC68, RA: 0x40688E20
***************************************************
******* Information of Last System Crash **********
***************************************************
The last crashinfo failed to be written.
Please verify the exception crashinfo configuration
the filesytem devices, and the free space on the
filesystem devices.
Using crashinfo_FAILED.
%Error opening crashinfo_FAILED (File not found)
bgpr2#
Seems to be a software issue as suggested.
Will try to get hold of the 12.4 which is around 28Mb and a bigger linear flash card, as my current card is just barely enough to hold the 22Mb 12.3 version.
08-28-2015 04:58 AM
12.4.3i or j is the same dram and flash that your running now there MD releases as well so they will be a lot more stable for bugs , those ED editions can be dodgy
I would just move off that as soon as possible to avoid another production reboot , you could always test 1 for a day and if its stable you should be good to change the 2nd router
08-28-2015 05:08 AM
Sir,
I'll try to work on your suggestion, but in-case I can't get hold of 12.4.3i/j, is the 12.4.3a stable?
08-28-2015 05:36 AM
Its an MD release as well so I would say you would be a lot better off , MD are main deployment fully tested , ED are early releases and can contain more bugs and there not as thoroughly tested
08-28-2015 05:49 AM
Sir,
Thank you for the feedback and your time for the assistance, I'll be working on this asap, and give feedback as well after the upgrade.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide