02-25-2011 09:44 PM - edited 03-06-2019 03:46 PM
I worked out my issue with getting data to pass through my new nm-1ge module but now shortly after switching over to it and giving it a fair bit of traffic the router crashes
I've attached the crash info file hopefully someone can help
Solved! Go to Solution.
02-26-2011 11:14 AM
Here's the specs. I didn't read them all.
http://www.cisco.com/en/US/prod/collateral/routers/ps282/product_data_sheet09186a008009203f.html
It should support two AIMs without problem. Doesn't say for sure thuogh if it supports 2 VPN AIMs but nonetheless its an AIM and it should by my understanding. Maybe you're right and its a main board problem. Had many of these issue on a 1700 router where one slot doesn't work.
The IOS is quite recent so it should support it. Even so and just to be sure, can you put the latest IOS on it?
Regards,
Ian
02-26-2011 05:53 AM
This is what the Cisco output interpreter suggests: (red indicates an error situation which could cause the crash). You might want to check the references.
SHOW BUFFERS ANALYSIS
ERROR: Since it's last reload, this router has created or maintained a relatively
large number of 'Middle buffers' yet still has very few free buffers.
The above symptoms suggest that a buffer leak has occurred.
BUFFER LEAK: When a process is finished with a buffer, the process should free the
buffer. A buffer leak occurs when the code forgets to process a buffer, or forgets
to free it after.
It is done with the packet As a result, the buffer pool continues to grow as more
and more packets are stuck in the buffers. Some routers (for example, 2600, 3600,
and 4000 Series) require a minimum amount of I/O memory to support certain interface
processors.
Not Enough Shared Memory for the Interfaces.
NOTE:
(1)Some of the Public Buffer pools should be abnormally large with few free buffers.
After a reload, you may see that the number of free buffers never gets close to
the number of total buffers.
(2)You should check the buffers on a regular basis. Some leaks are slow but others
are very fast.
(3)If you configure or access the router through telnet,you need to check the buffers
on a regular basis via remote access (telnet) before the router hang to see in
which pool is the leak. Once you see that for one pool the total number is increasing
and the free number is low (the faulty pool), you need to capture a 'show buffer
pool dump'. But if you don't have any memory available on the box, it's too late
to collect the information . You have to collect the information before the hang.
TRY THIS:
Router is running low on shared memory, even after a reload, physically removing
interfaces solves the problem.
This could be a Cisco IOS software bug. Upgrade to the latest version in your release
train to fix known buffer leak bugs. For example, if you are running Cisco IOS
Software Release 11.2(14), upgrade to the latest 11.2(x).
If you need assistence in the IOS upgradation and software download, please check
the below URL: Software Download Center
Commands to check the additional information about the content of the buffers:
show buffer pool (small - middle - big - verybig - large - huge): shows a summary
of the buffers for the specified pool.
show buffer pool (small - middle - big - verybig - large - huge) dump: shows a hex/ASCII
dump of all the buffers of a given pool.
show tech-support of the router.
How can we identify the pool encounters a problem:
(a) If number of misses & creates increases at high rate (as a % of hits)
(b) If consistently low number of buffers in free list
(c) If number of failure or number of memory increases
REFERENCE: For more information see Troubleshooting Buffer Leaks
REFERENCE: For more information see Troubleshooting Memory Problems
INFO: The buffer counters can be cleared only by reloading the router.
INFO: Interfaces use the 'interface buffer' pools for input and output (I/O). When
there are no more buffers in the interface buffer free list, the router goes to
the public buffer pools as a fallback. Performance is not affected in case of a
fallback. Interface buffers should not be tuned.
Here is the output field terminology for the 'show buffers' command:
- HITS: The number of buffers that have been requested from the buffer pool.
This counter provides a mechanism to determine which pool must meet the
highest demand for buffers.
- MISSES: The number of times buffers have been requested, but the processor
has detected a demand for additional buffers, and has been forced to create
them. Thus this counter represents the number of times the router has been
forced to create additional buffers.
- MAX-ALLOWED: The maximum number of buffers in the free-list. If the number of
buffers 'in free list' is greater than the 'max-allowed' value, the router will
attempt to trim buffers from the pool. The 'max-allowed' parameter is used to
prevent a pool from monopolizing buffers that it does not need anymore and free
this memory back to the system for further use.
- FREE-LIST: The number of buffers in the pool, ready for use.
- MIN: The minimum number of buffers from the pool at any given time.
- TRIMS: When the value 'in free list' exceeds that of 'max allowed' the processor
trims the buffers.
- CREATED: The number of buffers that are created when the free-list is less
than the minimum buffers allowed, or is of zero value.
- FAILURES: The number of failures met by the packets when there was a failure
in an attempt to create buffers even after additional buffers were created.
This counter represents the number of packets that have been dropped due to
buffer shortage.
- TOTAL: The total number of used and unused buffers.
- PERMANENT: Identifies the permanent number of allocated buffers in the pool,
that cannot be trimmed away.
- NO MEMORY: The number of failures caused by insufficient memory to create
additional buffers.
- INITIAL: The temporary buffers allotted during system reload and for session
establishments.
- MAX-FREE & MIN-FREE: The maximum and minimum number of free buffers.
SHOW MEMORY NOTIFICATIONS (if any)
INFO: Processor memory utilization is 5.05527%.
INFO: Processor memory or main memory stores the running configuration and routing
tables. The Cisco IOS software executes from main memory.
INFO: The amount of processor memory required by the router is affected by the Cisco
IOS version used, the size of the network and by the access list configurations.
Ensure that an optimal IOS version has been chosen.
INFO: The smallest amount of free processor memory used since the last boot is 382489572
byte(s).
INFO: The size of largest amount of processor memory free block currently available
is 381889500 byte(s).
INFO: For detailed memory analysis with respect to specific processes, consider
pasting "show processes memory" output to Output Interpreter.
INFO: If you are trying to determine the amount of installed memory on your device,
paste the output of "show version" to Output Interpreter.
REFERENCE: For more information see Troubleshooting Memory Problems.
SHOW PROCESS CPU NOTIFICATIONS (if any)
CPU Utilization is 0% (less than 20%) and there are no problems to report.
REFERENCE: For more information, see
High CPU Utilization on Catalyst 2900XL/3500XL Switches
Troubleshooting High CPU Utilization due to Processes
SHOW PROCESS MEMORY NOTIFICATIONS (if any)
INFO: The output of 'show process memory' only shows the memory associated with
the processor and does not identify other memory such as I/O, Fast, VM, etc. To
receive a statistical analysis on these types of memory, submit the first page of
output from the 'show memory' and 'show version' commands to Output Interpreter.
NOTE: The types of memory vary depending on router platform and installed modules.
INFO: Processor memory utilization is 5.02564%.
INFO: Processor memory or main memory stores the running configuration and routing
tables. The Cisco IOS software executes from main memory.
INFO: The amount of processor memory required by the router is affected by the Cisco
IOS version used, the size of the network and by the access list configurations.
Ensure that an optimal IOS version has been chosen.
INFO: The top 3 processes that are holding less than 1 MB of memory are:
'VLAN Manager' is holding 446580 bytes
'EEM ED Syslog' is holding 273864 bytes
'QOS_MODULE_MAIN' is holding 255060 bytes
HTH,
Ian
02-26-2011 07:51 AM
I disabled qos and inserted the gbic again and then gave the commands to remove the config from fa 0/0 and apply to gi2/0
and a second later it crashed with the following
%ERR-1-GT64120 (PCI-1): Fatal error, Parity error on master read
GT=0x24000000, cause=0x00100000, mask=0x00D01D00, real_cause=0x00100000
bus_err_high=0x00000000, bus_err_low=0x00000000, addr_decode_err=0x00000470
cpu_err_data_high=0xFFFFFFFF, cpu_err_data_low=0xFFFFFFFF, cpu_err_parity=0x0000
00FF
r0 = FFFFFFFF r1 = FFFFFFFF r2 = 0 r3 = 64A20000 r4 = 0
r5 = 65B18780 r6 = 0 r7 = 3E000000 r8 = 0 r9 = 3E8
r10 = 0 r11 = 3E8 r12 = 0 r13 = 1 r14 = 0
r15 = 6 r16 = 0 r17 = F4240 r18 = 0 r19 = 1
r20 = 0 r21 = 64E10000 r22 = 0 r23 = 65B17278 r24 = 0
r25 = 4 r26 = 0 r27 = 0 r28 = 0 r29 = 28B0A
r30 = FFFFFFFF r31 = D883D00D r32 = FFFFFFFF r33 = FFFFFFFF r34 = FFFFFFFF
r35 = FFFFFFFF r36 = FFFFFFFF r37 = FFFFFFFF r38 = FFFFFFFF r39 = FFFFFFFF
r40 = FFFFFFFF r41 = FFFFFFFF r42 = FFFFFFFF r43 = FFFFFFFF r44 = FFFFFFFF
r45 = FFFFFFFF r46 = FFFFFFFF r47 = FFFFFFFF r48 = 0 r49 = D
r50 = 0 r51 = 3E000000 r52 = 0 r53 = 0 r54 = 0
r55 = 0 r56 = FFFFFFFF r57 = FFFFFFFF r58 = 0 r59 = 65A28EA0
r60 = FFFFFFFF r61 = FFFFFFFF r62 = 0 r63 = 606CDDBC
sreg = 3401F903 mdlo_hi = 0 mdlo = 28B0A
mdhi_hi = 0 mdhi = 4 badvaddr_hi = FFFFFFFF
badvaddr = FFFFFFFF cause = FFFFFFFF epc_hi = 0
epc = 606CDE64 err_epc_hi = FFFFFFFF err_epc = FFFFFFFF
%ERR-1-FATAL: Fatal error interrupt, reloading
err_stat=0x0
=== Flushing messages (09:44:51 central Sat Feb 26 2011) ===
Queued messages:
09:44:51 central Sat Feb 26 2011: Interrupt exception, CPU signal 22, PC = 0x0
--------------------------------------------------------------------
Possible software fault. Upon reccurence, please collect
crashinfo, "show tech" and contact Cisco Technical Support.
--------------------------------------------------------------------
-Traceback=
$0 : 00000000, AT : 00000000, v0 : 00000000, v1 : 00000000
a0 : 00000000, a1 : 00000000, a2 : 00000000, a3 : 00000000
t0 : 00000000, t1 : 00000000, t2 : 00000000, t3 : 00000000
t4 : 00000000, t5 : 00000000, t6 : 00000000, t7 : 00000000
s0 : 00000000, s1 : 00000000, s2 : 00000000, s3 : 00000000
s4 : 00000000, s5 : 00000000, s6 : 00000000, s7 : 00000000
t8 : 00000000, t9 : 00000000, k0 : 00000000, k1 : 00000000
gp : 00000000, sp : 00000000, s8 : 00000000, ra : 00000000
EPC : 00000000, ErrorEPC : 00000000, SREG : 00000000
MDLO : 00000000, MDHI : 00000000, BadVaddr : 00000000
CacheErr : 00000000, DErrAddr0 : 00000000, DErrAddr1 : 00000000
DATA_START : 0x62D2ED00
Cause 00000000 (Code 0x0): Interrupt exception
Writing crashinfo to flash:crashinfo_20110226-154452
02-26-2011 08:00 AM
I'm afraid that might be the best way to go on this one: open a TAC case. They should be able to point you in the right direction.
Regards,
Ian
02-26-2011 07:58 AM
on trying to do show tech it crashed
which port is pci1?
is that an aim or an NM?
the router is a 3745 with a nm-1ge in the bottom left slot and a nm-16es-1g-p in the bottom right slot
02-26-2011 08:22 AM
I can't say for sure. Cisco usually starts at 0 from the right for slots and modules...if its the same for PCI then 1 would be the card on the left. But like I say, I'm not sure.
Regards,
Ian
02-26-2011 09:31 AM
I think its the chassis
if I put any nm in the top 2 slots I get a boot loop with it saying pci-1
the seller of the nm-1ge load tested it in a 3800 prior to shipping
from the info does that sound about right?
02-26-2011 10:13 AM
Hi,
I am not to sure if i have missed somethig. but from the crashinfo i can see that " %ERR-1-FATAL: Fatal error interrupt" has occured.
Explanation: This error message indicates a Hardware problem in the device.
1. Reload the device with no modules installed and check if the message appears.
2. Reload the device with each successive module and check to see if a certain
module or mis-seated module is causing this issue.
This way we will be able to find out the cause of the failure and replace the part.
can you please tell me what is the IOS version on the device.
Regards,
Sonal
02-26-2011 10:57 AM
there are only to modules in it
the other one (nme-16es-1g-p) is handeling intervlan routing and has been working fine for about a year
there are 2 aim modules in it (one aim seems to not be working as its status led is amber though I've tried multiple aim's in that slot and none have gone to status green so I think ether 2 vpn aims isn't supported or the main board is faulty)
when I initialy got the nme-16es-1g-p I had put it in slot 4 then slot 3 and finnaly found slot 1 to work properly
then just this week I got the nm-1ge and tried slot 3 then remembered last time and tried slot 2
using slots 1 and 2 don't cause a boot loop
I think I had tried the nme-16es-1g-p in slot 2 anf I think slot 2 caused a boot loop (though I'm just now remembering that part so I might be wrong)
ios is Version 12.4(25d)
rommon is Version 12.2(8r)T2
right now as long as the gbic is not plugged into the nm-1ge the router doesn't crash
02-26-2011 11:14 AM
Here's the specs. I didn't read them all.
http://www.cisco.com/en/US/prod/collateral/routers/ps282/product_data_sheet09186a008009203f.html
It should support two AIMs without problem. Doesn't say for sure thuogh if it supports 2 VPN AIMs but nonetheless its an AIM and it should by my understanding. Maybe you're right and its a main board problem. Had many of these issue on a 1700 router where one slot doesn't work.
The IOS is quite recent so it should support it. Even so and just to be sure, can you put the latest IOS on it?
Regards,
Ian
02-26-2011 03:33 PM
it has the latest IOS that comes up for it when I browse through ios's
02-27-2011 03:10 AM
You're right...in that case see if you can open a TAC.
Regards,
Ian
03-04-2011 07:58 AM
well the problem seems to be solved now
seems I had an unsupported setup which I'll explain and list the related items
1. 3745 /w 512dram
2. rommon ver 12.2(8r)T2
I stumbled across an artical while looking for the part number for the inline power supply to enable poe on the ethswitch and the artical said I needed to have a newer rommon ver for the 512dram to be stable (though its odd it went for several years without issue but it had issue with anything other than nm-1 being filled
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide