cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1927
Views
20
Helpful
12
Replies

3745 router crash

1_1
Level 1
Level 1

I worked out my issue with getting data to pass through my new nm-1ge module but now shortly after switching over to it and giving it a fair bit of traffic the router crashes

I've attached the crash info file hopefully someone can help

1 Accepted Solution

Accepted Solutions

Here's the specs. I didn't read them all.

http://www.cisco.com/en/US/prod/collateral/routers/ps282/product_data_sheet09186a008009203f.html

It should support two AIMs without problem. Doesn't say for sure thuogh if it supports 2 VPN AIMs but nonetheless its an AIM and it should by my understanding. Maybe you're right and its a main board problem. Had many of these issue on a 1700 router where one slot doesn't work.

The IOS is quite recent so it should support it. Even so and just to be sure, can you put the latest IOS on it?

Regards,

Ian

View solution in original post

12 Replies 12

IAN WHITMORE
Level 4
Level 4

This is what the Cisco output interpreter suggests: (red indicates an error situation which could cause the crash). You might want to check the references.

SHOW BUFFERS ANALYSIS
ERROR: Since it's last reload, this router has created or maintained a relatively large number of 'Middle buffers' yet still has very few free buffers. The above symptoms suggest that a buffer leak has occurred. BUFFER LEAK: When a process is finished with a buffer, the process should free the buffer. A buffer leak occurs when the code forgets to process a buffer, or forgets to free it after. It is done with the packet As a result, the buffer pool continues to grow as more and more packets are stuck in the buffers. Some routers (for example, 2600, 3600, and 4000 Series) require a minimum amount of I/O memory to support certain interface processors. Not Enough Shared Memory for the Interfaces. NOTE: (1)Some of the Public Buffer pools should be abnormally large with few free buffers. After a reload, you may see that the number of free buffers never gets close to the number of total buffers. (2)You should check the buffers on a regular basis. Some leaks are slow but others are very fast. (3)If you configure or access the router through telnet,you need to check the buffers on a regular basis via remote access (telnet) before the router hang to see in which pool is the leak. Once you see that for one pool the total number is increasing and the free number is low (the faulty pool), you need to capture a 'show buffer pool  dump'. But if you don't have any memory available on the box, it's too late to collect the information . You have to collect the information before the hang. TRY THIS: Router is running low on shared memory, even after a reload, physically removing interfaces solves the problem. This could be a Cisco IOS software bug. Upgrade to the latest version in your release train to fix known buffer leak bugs. For example, if you are running Cisco IOS Software Release 11.2(14), upgrade to the latest 11.2(x). If you need assistence in the IOS upgradation and software download, please check the below URL: Software Download Center Commands to check the additional information about the content of the buffers: show buffer pool (small - middle - big - verybig - large - huge): shows a summary of the buffers for the specified pool. show buffer pool (small - middle - big - verybig - large - huge) dump: shows a hex/ASCII dump of all the buffers of a given pool. show tech-support of the router. How can we identify the pool encounters a problem: (a) If number of misses & creates increases at high rate (as a % of hits) (b) If consistently low number of buffers in free list (c) If number of failure or number of  memory increases REFERENCE: For more information see Troubleshooting Buffer Leaks REFERENCE: For more information see Troubleshooting Memory Problems INFO: The buffer counters can be cleared only by reloading the router. INFO: Interfaces use the 'interface buffer' pools for input and output (I/O). When there are no more buffers in the interface buffer free list, the router goes to the public buffer pools as a fallback. Performance is not affected in case of a fallback. Interface buffers should not be tuned. Here is the output field terminology for the 'show buffers' command:   - HITS: The number of buffers that have been requested from the buffer pool.     This counter provides a mechanism to determine which pool must meet the     highest demand for buffers.   - MISSES: The number of times buffers have been requested, but the processor     has detected a demand for additional buffers, and has been forced to create     them. Thus this counter represents the number of times the router has been     forced to create additional buffers.   - MAX-ALLOWED: The maximum number of buffers in the free-list. If the number of     buffers 'in free list' is greater than the 'max-allowed' value, the router will     attempt to trim buffers from the pool. The 'max-allowed' parameter is used to     prevent a pool from monopolizing buffers that it does not need anymore and free     this memory back to the system for further use.   - FREE-LIST: The number of buffers in the pool, ready for use.   - MIN: The minimum number of buffers from the pool at any given time.   - TRIMS: When the value 'in free list' exceeds that of 'max allowed' the processor     trims the buffers.   - CREATED: The number of buffers that are created when the free-list is less     than the minimum buffers allowed, or is of zero value.   - FAILURES: The number of failures met by the packets when there was a failure     in an attempt to create buffers even after additional buffers were created.     This counter represents the number of packets that have been dropped due to     buffer shortage.   - TOTAL: The total number of used and unused buffers.   - PERMANENT: Identifies the permanent number of allocated buffers in the pool,     that cannot be trimmed away.   - NO MEMORY: The number of failures caused by insufficient memory to create     additional buffers.   - INITIAL: The temporary buffers allotted during system reload and for session     establishments.   - MAX-FREE & MIN-FREE: The maximum and minimum number of free buffers.


SHOW MEMORY NOTIFICATIONS (if any)

INFO: Processor memory utilization is 5.05527%.
INFO: Processor memory or main memory stores the running configuration and routing
tables. The Cisco IOS software executes from main memory.
INFO: The amount of processor memory required by the router is affected by the Cisco
IOS version used, the size of the network and by the access list configurations.
Ensure that an optimal IOS version has been chosen.

INFO: The smallest amount of free processor memory used since the last boot is 382489572
byte(s).
INFO: The size of largest amount of processor memory free block currently available
is 381889500 byte(s).

INFO: For detailed memory analysis with respect to specific processes, consider
pasting "show processes memory" output to Output Interpreter.
INFO: If you are trying to determine the amount of installed memory on your device,
paste the output of "show version" to Output Interpreter.

REFERENCE: For more information see Troubleshooting Memory Problems.


SHOW PROCESS CPU NOTIFICATIONS (if any)

CPU Utilization is 0% (less than 20%) and there are no problems to report.

REFERENCE: For more information, see
  High CPU Utilization on Catalyst 2900XL/3500XL Switches
  Troubleshooting High CPU Utilization due to Processes


SHOW PROCESS MEMORY NOTIFICATIONS (if any)

INFO: The output of 'show process memory' only shows the memory associated with
the processor and does not identify other memory such as I/O, Fast, VM, etc. To
receive a statistical analysis on these types of memory, submit the first page of
output from the 'show memory' and 'show version' commands to Output Interpreter.
NOTE: The types of memory vary depending on router platform and installed modules.

INFO: Processor memory utilization is 5.02564%.
INFO: Processor memory or main memory stores the running configuration and routing
tables. The Cisco IOS software executes from main memory.
INFO: The amount of processor memory required by the router is affected by the Cisco
IOS version used, the size of the network and by the access list configurations.
Ensure that an optimal IOS version has been chosen.

INFO: The top 3 processes that are holding less than 1 MB of memory are:
  'VLAN Manager' is holding 446580 bytes
  'EEM ED Syslog' is holding 273864 bytes
  'QOS_MODULE_MAIN' is holding 255060 bytes

HTH,
Ian

I disabled qos and inserted the gbic again and then gave the commands to remove the config from fa 0/0 and apply to gi2/0

and a second later it crashed with the following

%ERR-1-GT64120 (PCI-1): Fatal error, Parity error on master read
GT=0x24000000, cause=0x00100000, mask=0x00D01D00, real_cause=0x00100000
bus_err_high=0x00000000, bus_err_low=0x00000000, addr_decode_err=0x00000470
cpu_err_data_high=0xFFFFFFFF, cpu_err_data_low=0xFFFFFFFF, cpu_err_parity=0x0000
00FF
r0  = FFFFFFFF r1  = FFFFFFFF r2  = 0        r3  = 64A20000 r4  = 0
r5  = 65B18780 r6  = 0        r7  = 3E000000 r8  = 0        r9  = 3E8
r10 = 0        r11 = 3E8      r12 = 0        r13 = 1        r14 = 0
r15 = 6        r16 = 0        r17 = F4240    r18 = 0        r19 = 1
r20 = 0        r21 = 64E10000 r22 = 0        r23 = 65B17278 r24 = 0
r25 = 4        r26 = 0        r27 = 0        r28 = 0        r29 = 28B0A
r30 = FFFFFFFF r31 = D883D00D r32 = FFFFFFFF r33 = FFFFFFFF r34 = FFFFFFFF
r35 = FFFFFFFF r36 = FFFFFFFF r37 = FFFFFFFF r38 = FFFFFFFF r39 = FFFFFFFF
r40 = FFFFFFFF r41 = FFFFFFFF r42 = FFFFFFFF r43 = FFFFFFFF r44 = FFFFFFFF
r45 = FFFFFFFF r46 = FFFFFFFF r47 = FFFFFFFF r48 = 0        r49 = D
r50 = 0        r51 = 3E000000 r52 = 0        r53 = 0        r54 = 0
r55 = 0        r56 = FFFFFFFF r57 = FFFFFFFF r58 = 0        r59 = 65A28EA0
r60 = FFFFFFFF r61 = FFFFFFFF r62 = 0        r63 = 606CDDBC
sreg     = 3401F903 mdlo_hi    = 0        mdlo        = 28B0A
mdhi_hi  = 0        mdhi       = 4        badvaddr_hi = FFFFFFFF
badvaddr = FFFFFFFF cause      = FFFFFFFF epc_hi      = 0
epc      = 606CDE64 err_epc_hi = FFFFFFFF err_epc     = FFFFFFFF

%ERR-1-FATAL: Fatal error interrupt, reloading
err_stat=0x0


=== Flushing messages (09:44:51 central Sat Feb 26 2011) ===

Queued messages:

09:44:51 central Sat Feb 26 2011: Interrupt exception, CPU signal 22, PC = 0x0

--------------------------------------------------------------------
   Possible software fault. Upon reccurence,  please collect
   crashinfo, "show tech" and contact Cisco Technical Support.
--------------------------------------------------------------------


-Traceback=
$0 : 00000000, AT : 00000000, v0 : 00000000, v1 : 00000000
a0 : 00000000, a1 : 00000000, a2 : 00000000, a3 : 00000000
t0 : 00000000, t1 : 00000000, t2 : 00000000, t3 : 00000000
t4 : 00000000, t5 : 00000000, t6 : 00000000, t7 : 00000000
s0 : 00000000, s1 : 00000000, s2 : 00000000, s3 : 00000000
s4 : 00000000, s5 : 00000000, s6 : 00000000, s7 : 00000000
t8 : 00000000, t9 : 00000000, k0 : 00000000, k1 : 00000000
gp : 00000000, sp : 00000000, s8 : 00000000, ra : 00000000
EPC  : 00000000, ErrorEPC : 00000000, SREG     : 00000000
MDLO : 00000000, MDHI     : 00000000, BadVaddr : 00000000
CacheErr : 00000000, DErrAddr0 : 00000000, DErrAddr1 : 00000000
DATA_START : 0x62D2ED00
Cause 00000000 (Code 0x0): Interrupt exception

Writing crashinfo to flash:crashinfo_20110226-154452

I'm afraid that might be the best way to go on this one: open a TAC case. They should be able to point you in the right direction.

Regards,

Ian

1_1
Level 1
Level 1

on trying to do show tech it crashed

which port is pci1?

is that an aim or an NM?

the router is a 3745 with a nm-1ge in the bottom left slot and a nm-16es-1g-p in the bottom right slot

I can't say for sure. Cisco usually starts at 0 from the right for slots and modules...if its the same for PCI then 1 would be the card on the left. But like I say, I'm not sure.

Regards,

Ian

I think its the chassis

if I put any nm in the top 2 slots I get a boot loop with it saying pci-1

the seller of the nm-1ge load tested it in a 3800 prior to shipping

from the info does that sound about right?

Sonal Singh
Cisco Employee
Cisco Employee

Hi,


I am not to sure if i have missed somethig. but from the crashinfo i can see that   "  %ERR-1-FATAL: Fatal error interrupt" has occured.


Explanation: This error message indicates a Hardware problem in the device.

1. Reload the device with no modules installed and check if the message appears.
2. Reload the device with each successive module and check to see if a certain
     module or mis-seated module is causing this issue.


This way we will be able to find out the cause of the failure and replace the part.

can you please tell me what is the IOS version on the device.


Regards,

Sonal

there are only to modules in it

the other one (nme-16es-1g-p) is handeling intervlan routing and has been working fine for about a year

there are 2 aim modules in it (one aim seems to not be working as its status led is amber though I've tried multiple aim's in that slot and none have gone to status green so I think ether 2 vpn aims isn't supported or the main board is faulty)

when I initialy got the nme-16es-1g-p I had put it in slot 4 then slot 3 and finnaly found slot 1 to work properly

then just this week I got the nm-1ge and tried slot 3 then remembered last time and tried slot 2

using slots 1 and 2 don't cause a boot loop

I think I had tried the nme-16es-1g-p in slot 2 anf I think slot 2 caused a boot loop (though I'm just now remembering that part so I might be wrong)

ios is  Version 12.4(25d)

rommon is Version 12.2(8r)T2

right now as long as the gbic is not plugged into the nm-1ge the router doesn't crash

Here's the specs. I didn't read them all.

http://www.cisco.com/en/US/prod/collateral/routers/ps282/product_data_sheet09186a008009203f.html

It should support two AIMs without problem. Doesn't say for sure thuogh if it supports 2 VPN AIMs but nonetheless its an AIM and it should by my understanding. Maybe you're right and its a main board problem. Had many of these issue on a 1700 router where one slot doesn't work.

The IOS is quite recent so it should support it. Even so and just to be sure, can you put the latest IOS on it?

Regards,

Ian

it has the latest IOS that comes up for it when I browse through ios's

You're right...in that case see if you can open a TAC.

Regards,

Ian

well the problem seems to be solved now

seems I had an unsupported setup which I'll explain and list the related items

1. 3745 /w 512dram

2. rommon ver 12.2(8r)T2

I stumbled across an artical while looking for the part number for the inline power supply to enable poe on the ethswitch and the artical said I needed to have a newer rommon ver for the 512dram to be stable (though its odd it went for several years without issue but it had issue with anything other than nm-1 being filled

Review Cisco Networking for a $25 gift card