07-24-2014 10:34 AM - edited 08-29-2017 03:03 PM
The type of memory area in a process can be classified as follows:
Memory Area Description
Program text area | Program Text | Area where program instructions in a process are stored |
Program data area | Program Data | Area where program data in a process (such as static variables) are stored |
Program stack area | Program Stack | Area where program stacks in a process (such as local variables) are stored |
DLL text area | DLL Text | Area where program instructions in a DLL loaded by a process are stored |
DLL data area | DLL Data | Area where program data in a DLL loaded by a process (such as static variables) are stored |
Heap area | Dynamic | Area used for dynamic allocation in a process |
Shared memory area | Shared | Area shared by multiple processes |
Crashed process will be restarted automatically and all the memory leak in the process will be restored.
dumper[59]: %OS-DUMPER-7-DUMP_REQUEST : Dump request for process pkg/bin/snmpd dumper[59]: %OS-DUMPER-7-DUMP_ATTRIBUTE : Dump request with attribute 7 for process pkg/bin/snmpd dumper[59]: %OS-DUMPER-4-SIGNAL_NUMBER : Thread 1 received SIGNAL 31, si code 0, si errno 0 dumper[59]: %OS-DUMPER-4-SIGNALCORE_INFO : Core for pid = 278830 (pkg/bin/snmpd) as signal 31 sent by pkg/bin/snmpd@node0_RSP0_CPU0 dumper[59]: %OS-DUMPER-7-PROC_PAGES : Process memory pages 925 dumper[59]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice: 0(harddisk:/dumper) in use dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r0 r1 r2 r3 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R0 00000004 e7fffa10 5002b0a0 00000001 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r4 r5 r6 r7 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R4 e7fffaa8 00000001 e7fffa30 00000183 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r8 r9 r10 r11 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R8 412e0001 00000001 00000001 501e24a9 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r12 r13 r14 r15 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R12 501ddce4 5002b0a0 e7fffb60 00000001 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r16 r17 r18 r19 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R16 e7fffb74 e7fffb7c e7fffc18 00000000 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r20 r21 r22 r23 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R20 0051626c 00000000 00000000 50028860 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r24 r25 r26 r27 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R24 00000000 e7fffa30 e7fffaec ec017384 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : r28 r29 r30 r31 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R28 50033684 00000001 00000001 e7fffaa8 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : cnt lr msr pc dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R32 00000000 4c242678 0000d932 4c2c4ec4 dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : cnd xer dumper[59]: %OS-DUMPER-7-REGISTERS_INFO : R36 44002042 20000000 dumper[59]: %OS-DUMPER-7-TRACE_BACK : #0 0x4c2c4ec4 dumper[59]: %OS-DUMPER-7-TRACE_BACK : #1 0x4c242678 dumper[59]: %OS-DUMPER-7-TRACE_BACK : #2 0x4c242ac4 dumper[59]: %OS-DUMPER-7-TRACE_BACK : #3 0x4c2457ec dumper[59]: %OS-DUMPER-7-TRACE_BACK : #4 0x40011bbc dumper[59]: %OS-DUMPER-7-TRACE_BACK : #5 0x4001346c dumper[59]: %OS-DUMPER-7-TRACE_BACK : #6 0x40086bc0 -- snip -- dumper[59]: %OS-DUMPER-6-CORE_MD5 : harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.Z has just been written. size:78133573 md05:04 AMc931525d07a348370ac12d08584a3d -- snip --
If a process crashes due to signal 31, process coredump will be collected and saved under hard disk automatically.
Collect the console log described above and the following files.
harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.Z
harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.txt
harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.cpu_info.Z
However, it may be difficult to identify the root cause of the memory leak only by the above coredump files.
For this reason, it is important to detect problems as early as possible and collect required logs several time while in a problematic state.
The first step for memory leak troubleshooting is to identify which nods (RSP/LC) and which process the memory leak is observed in.
The following approaches can be available to identify memory leak processes:
Check if the dynamic memory usage per process is increasing continuously by show memory compare commands.
Collect a start snapshot first
RP/0/RP0/CPU0:CRS#show memory compare start
Successfully stored memory snapshot /harddisk:/malloc_dump/memcmp_start.out
(*) A start snapshot file will be created under the hard disk as shown above.
Collect an end snapshot next
RP/0/RP0/CPU0:CRS#show memory compare end Successfully stored memory snapshot /harddisk:/malloc_dump/memcmp_end.out (*) An end snapshot file will be created under the hard disk as shown above.
Compare two start/end snapshots
RP/0/RP0/CPU0:CRS#show memory compare report JID name mem before mem after difference mallocs restart/exit/new --- ---- ---------- --------- ---------- ------- ---------------- 117 arp 662916 663040 124 2 65819 exec 257480 257568 88 3 415 wdsysmon 4190296 4189240 -1056 -4 You are now free to remove snapshot memcmp_start.out and memcmp_end.out under /harddisk:/malloc_dump (*) Displays the difference between start/end snapshots created under the hard disk.
ItemExecutive Summary
JID | Job ID assigned to a process |
name | Process name |
mem before | Dynamic memory usage in a start snapshot (Bytes) |
mem after | Dynamic memory usage in an end snapshot (Bytes) |
difference | Increase/decrease of the dynamic memory usage between start and end snapshots (Bytes) |
mallocs | The difference between the number of memory allocation API calls between start and end snapshots |
restart/exit/new | A flag indicating whether the restart/exit/start process was executed between start and end snapshots |
Check If the dynamic memory usage per process is increasing continuously by collecting snapshots periodically via show process memory command.
RP/0/RSP0/CPU0:ASR9K#show processes memory location 0/RSP0/CPU0 JID Text Data Stack Dynamic Process ------ ---------- ---------- ---------- ---------- ------- 340 167936 110592 184320 22089728 parser_server 241 589824 970752 81920 14663680 iedged 1110 401408 131072 122880 13197312 ipv6_rib 1109 405504 131072 159744 12632064 ipv4_rib 1040 1056768 2867200 253952 10731520 bgp 1105 2396160 10485760 135168 8548352 l2vpn_mgr 60 110592 53248 139264 7630848 eth_server 1133 65536 20480 32768 7479296 schema_server 334 143360 69632 53248 7417856 nvgen_server 1011 1331200 598016 135168 6258688 ospf -- snip -- (*) Displays in descending order of the dynamic memory usage volume. Check to see if Dynamic is increasing continually.
After a memory leak process is identified, the next step is to identify the memory leak blocks in a process.
Collect process and DLL mapping information for a memory leak process first.
Collect a memory snapshot a couple of times and compare the difference among those snapshots.
Collect a process running coredump.
Check the process information of a memory leak process using show processes command.
RP/0/RSP0/CPU0:ASR9K#show processes snmpd Job Id: 1113 PID: 844710071 Executable path: /disk0/iosxr-infra-4.1.2/bin/snmpd Instance #: 1 Version ID: 00.00.0000 Respawn: ON Respawn count: 2 Max. spawns per minute: 12 -- snip --
Process ID (PID) can be changed each time a process is restarted, but Job ID (JID) remains the same even after process restart.
The above Job ID will be referred to <jid> in the command examples hereinafter.
Collect DLL mapping information of a memory leak process using show dll command.
RP/0/RSP0/CPU0:ASR9K#show dll <jid>
Collect the following three memory snapshots of process heap.
Collect a snapshot of heap summary in a process context.
RP/0/RSP0/CPU0:ASR9K#show memory heap dllname <jid> Malloc summary for pid 844710071 process snmpd: Heapsize 11325440: allocd 10365240, free 27120, overhead 933080, high watermark 11325440 Calls: mallocs 95070437; reallocs 964756; frees 94961917; [core-allocs 192; core-frees 7] -- snip -- Block Allocated List: Total Total Block Name/ID/Caller Usize Size Count 0x003d2ba8 0x0040cc08 0x00007325 0x40020fd4 <N/A> 0x0022118f 0x0026b558 0x00007325 0x40020fe8 <N/A> 0x00100270 0x001002c8 0x00000009 GSP_MEM 0x00051860 0x00059ca0 0x0000104e 0x400283dc <N/A> 0x00046374 0x000535b0 0x0000161c 0x4e894c58 snmp_object_create_list 0x000212a0 0x0002c380 0x0000161c 0x4e894ccc snmp_object_create_list 0x0001e000 0x0001e008 0x00000001 0x4c5de600 chk_messaging_init_msgbuf 0x0001e000 0x0001e008 0x00000001 0x4c5de5b0 chk_messaging_init_msgbuf 0x00018316 0x000323d8 0x0000216f 0x4c2f6178 strdup 0x00015acf 0x000210e8 0x0000108d 0x4006c8a8 <N/A> -- snip --
Collect a snapshot of heap blocks in a process context.
RP/0/RSP0/CPU0:ASR9K#show memory heap allocated dllname <jid> Block Allocated List: Usize Size Address Name/ID/Caller 0x0003fff0 0x0003fff8 0x501b6024 GSP_MEM 0x0003fff0 0x0003fff8 0x50359024 GSP_MEM 0x0003fff0 0x0003fff8 0x5042a024 GSP_MEM 0x0003fff0 0x0003fff8 0x504b3024 GSP_MEM 0x0001e000 0x0001e008 0x50080024 0x4c5de600 chk_messaging_init_msgbuf 0x0001e000 0x0001e008 0x5009f024 0x4c5de5b0 chk_messaging_init_msgbuf 0x00008400 0x00008408 0x50043aec 0x4c5dee70 chk_messaging_init 0x00007a58 0x00007a60 0x501f7024 [chunk NH RULES CHUNK hdr]: 304 bytes, 100 per blk (a=0/f=100/h=0) (blks 0, 0)][NH RULES CHUNK chunk data: 0x501f7078 free: 100] 0x00005a44 0x00005a4c 0x502d1024 0x4e8948f4 snmp_object_add_object 0x0000413c 0x00004144 0x506c4024 0x4007f12c <N/A> -- snip -- 0x00000088 0x000000a0 0x500d25c4 0x40020fd4 <N/A> 0x00000088 0x000000a0 0x50199504 0x40020fd4 <N/A> 0x00000088 0x000000a0 0x50277038 0x40020fd4 <N/A> 0x00000088 0x000000a0 0x50278e04 0x40020fd4 <N/A> 0x00000088 0x000000a0 0x5027b304 0x40020fd4 <N/A> 0x00000088 0x000000a0 0x503407d8 0x40020fd4 <N/A> -- snip -- 0x00000050 0x00000058 0x50706424 0x40020fe8 <N/A> 0x0000004d 0x00000068 0x50199284 0x40020fe8 <N/A> 0x0000004d 0x00000068 0x5023dab4 0x40020fe8 <N/A> 0x0000004d 0x00000068 0x5027887c 0x40020fe8 <N/A> 0x0000004d 0x00000068 0x5027b914 0x40020fe8 <N/A> -- snip --
Collect memory summary in a process context.
RP/0/RSP0/CPU0:ASR9K#show memory <jid> Physical Memory: 4096M total (1988M available) Application Memory : 3915M (1988M available) Image: 51M (bootram: 51M) Reserved: 128M, IOMem: 2028M, flashfsys: 0 -- snip -- pkg/bin/snmpd: jid 1113 Address Bytes What 40000000 557056 Program Text 4c000000 67108864 DLL Text 50000000 159744 Program/DLL Data 50027000 4096 Program/DLL Data 50028000 1961984 Allocated Memory 50207000 36864 Physical Mapped Memory 50210000 131072 Allocated Memory 50237000 2011136 Allocated Memory 50429000 7254016 Allocated Memory 50b1c000 98304 Allocated Memory 50b39000 1277952 Allocated Memory 60100000 212992 Shared Memory 60134000 98304 Shared Memory 6014c000 40960 Shared Memory -- snip -- ec51a000 4096 Program/DLL Data ec745000 8192 Program/DLL Data ec748000 8192 Program/DLL Data ec74b000 20480 Program/DLL Data ec751000 4096 Program/DLL Data Total Allocated Memory: 12734464 Total Shared Memory: 844582912
You can check if the number of allocated memory blocks has increased/decreased by comparing start/end snapshots of show memory heap dllname.
You can see the increase in the following example (in red).
Start snapshot
RP/0/RSP0/CPU0:ASR9K#show memory heap dllname <jid> Malloc summary for pid 844710071 process snmpd: Heapsize 11325440: allocd 10365240, free 27120, overhead 933080, high watermark 11325440 Calls: mallocs 95070437; reallocs 964756; frees 94961917; [core-allocs 192; core-frees 7] -- snip -- Block Allocated List: Total Total Block Name/ID/Caller Usize Size Count 0x003d2ba8 0x0040cc08 0x00007325 0x40020fd4 <N/A> 0x0022118f 0x0026b558 0x00007325 0x40020fe8 <N/A> 0x00100270 0x001002c8 0x00000009 GSP_MEM 0x00051860 0x00059ca0 0x0000104e 0x400283dc <N/A> 0x00046374 0x000535b0 0x0000161c 0x4e894c58 snmp_object_create_list 0x000212a0 0x0002c380 0x0000161c 0x4e894ccc snmp_object_create_list -- snip --
End snapshot
RP/0/RSP0/CPU0:ASR9K#show memory heap dllname <jid> Malloc summary for pid 844710071 process snmpd: Heapsize 13783040: allocd 12554380, free 47572, overhead 1181088, high watermark 13783040 Calls: mallocs 122449585; reallocs 1248003; frees 122315443; [core-allocs 247; core-frees 20] -- snip -- Block Allocated List: Total Total Block Name/ID/Caller Usize Size Count 0x004f19a8 0x0053c9dc 0x000094e5 0x40020fd4 <N/A> 0x002c1237 0x003211d8 0x000094e5 0x40020fe8 <N/A> 0x00100270 0x001002c8 0x00000009 GSP_MEM 0x00051860 0x00059ca0 0x0000104e 0x400283dc <N/A> 0x00046374 0x000535b0 0x0000161c 0x4e894c58 snmp_object_create_list 0x0002fa12 0x00063210 0x0000420b 0x4c2f6178 strdup 0x000212a0 0x0002c380 0x0000161c 0x4e894ccc snmp_object_create_list -- snip --
You may be able to identify the cause of memory leak by comparing the Start and End snapshots, but you may need to collect process running coredump in some cases.
Collect process running coredump
RP/0/RSP0/CPU0:ASR9K#dumpcore running <jid>
Run the above command and then collect the following coredump files:
harddisk:/dumper/first.snmpd_<jid>.by.dumper_gen.node0_RSP0_CPU0.ppc.Z harddisk:/dumper/first.snmpd_<jid>.by.dumper_gen.node0_RSP0_CPU0.ppc.txt harddisk:/dumper/first.snmpd_<jid>.by.dumper_gen.node0_RSP0_CPU0.ppc.cpu_info.Z
The name of the running coredump files includes by.dumper_gen, which is different from the case of process crash.
Original Document: https://supportforums.cisco.com/ja/document/11935591
Author: Taisuke Sasaki
Posted on October 1, 2013
I came across the same issue with sshd_child_handler after upgrading ASR9001 to IOS XR Software, Version 6.3.3. Still not sure how to fix this. Will be escalating to the TAC.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: