cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Webcast SD-WAN

IOS-XR: Process Memory Leak

7270
Views
10
Helpful
1
Comments

     

    Introduction

    Type of memory area in a process

    The type of memory area in a process can be classified as follows:

    Memory Area   Description

    Program text area

    Program Text

    Area where program instructions in a process are stored
    Program data area

    Program Data

    Area where program data in a process (such as static variables) are stored
    Program stack area

    Program Stack

    Area where program stacks in a process (such as local variables) are stored
    DLL text area

    DLL Text

    Area where program instructions in a DLL loaded by a process are stored
    DLL data area

    DLL Data

    Area where program data in a DLL loaded by a process (such as static variables) are stored
    Heap area

    Dynamic

    Area used for dynamic allocation in a process
    Shared memory area

    Shared

    Area shared by multiple processes

     

    What is dynamic memory leak?

    • Dynamic memory leak occurs in a process or DLLs loaded by a process due to unexpected software problems.
    • The amount of dynamic memory usage increases continuously or periodically and never decreases.

    Important Information about memory leak analysis

    • Whether the amount of dynamic memory usage is increasing by the memory leak (Isn't it a transient memory increase)?
    • In which node (RSP/LC) and in which process is the memory leak is detected?
    • What event triggers the memory leak (such as configuration changes or SNMP MIB polling, etc.)?
    • How fast and how much memory leak is observed (1 MByte/day, etc)?

    What is caused by memory leak?

    • Each process has the maximum amount of dynamic memory to be allocated.
    • The different maximum limit can be set to each process.
    • If the dynamic memory usage exceeds the maximum limit, a process may crash due to signal 31 (see the example below)
    • Crashed process will be restarted automatically and all the memory leak in the process will be restored.

    dumper[59]: %OS-DUMPER-7-DUMP_REQUEST : Dump request for process pkg/bin/snmpd
    dumper[59]: %OS-DUMPER-7-DUMP_ATTRIBUTE : Dump request with attribute 7 for process pkg/bin/snmpd
    dumper[59]: %OS-DUMPER-4-SIGNAL_NUMBER : Thread 1 received SIGNAL 31, si code 0, si errno 0
    dumper[59]: %OS-DUMPER-4-SIGNALCORE_INFO : Core for pid = 278830 (pkg/bin/snmpd) as signal 31 sent by pkg/bin/snmpd@node0_RSP0_CPU0
    dumper[59]: %OS-DUMPER-7-PROC_PAGES : Process memory pages 925
    dumper[59]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  0(harddisk:/dumper) in use
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :             r0       r1       r2       r3
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R0   00000004  e7fffa10  5002b0a0  00000001  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :             r4       r5       r6       r7
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R4   e7fffaa8  00000001  e7fffa30  00000183  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :             r8       r9      r10      r11
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R8   412e0001  00000001  00000001  501e24a9  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            r12      r13      r14      r15
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R12  501ddce4  5002b0a0  e7fffb60  00000001  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            r16      r17      r18      r19
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R16  e7fffb74  e7fffb7c  e7fffc18  00000000  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            r20      r21      r22      r23
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R20  0051626c  00000000  00000000  50028860  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            r24      r25      r26      r27
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R24  00000000  e7fffa30  e7fffaec  ec017384  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            r28      r29      r30      r31
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R28  50033684  00000001  00000001  e7fffaa8  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            cnt       lr      msr       pc
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R32  00000000  4c242678  0000d932  4c2c4ec4  
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :            cnd      xer
    dumper[59]: %OS-DUMPER-7-REGISTERS_INFO :   R36  44002042  20000000  
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #0  0x4c2c4ec4
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #1  0x4c242678
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #2  0x4c242ac4
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #3  0x4c2457ec
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #4  0x40011bbc
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #5  0x4001346c
    dumper[59]: %OS-DUMPER-7-TRACE_BACK : #6  0x40086bc0
    -- snip --
    dumper[59]: %OS-DUMPER-6-CORE_MD5 : harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.Z has just been written. size:78133573 md05:04 AMc931525d07a348370ac12d08584a3d 
    -- snip --
    

    If a process crashes due to signal 31, process coredump will be collected and saved under hard disk automatically.

    Collect  the console log described above and the following files.

    harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.Z
    harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.txt
    harddisk:/dumper/first.snmpd_1113.by.snmpd.node0_RSP0_CPU0.ppc.cpu_info.Z
    

    However, it may be difficult to identify the root cause of the memory leak only by the above coredump files.

    For this reason, it is important to detect problems as early as possible and collect required logs several time while in a problematic state.

    How to identify memory leak processes

    The first step for memory leak troubleshooting is to identify which nods (RSP/LC) and which process the memory leak is observed in.

    The following approaches can be available to identify memory leak processes:

    • Monitoring by show memory compare command
    • Monitoring by show processes memory command
    • Monitoring by SNMP MIB

     

    Monitoring by show memory compare command

    Check if the dynamic memory usage per process is increasing continuously by show memory compare commands.

    1. show memory compare start
    2. Wait for a while (according to how fast and how much the memory usage increases)
    3. show memory compare end
    4. show memory compare report

    Collect a start snapshot first

    RP/0/RP0/CPU0:CRS#show memory compare start
    Successfully stored memory snapshot /harddisk:/malloc_dump/memcmp_start.out
    (*) A start snapshot file will be created under the hard disk as shown above.
    

    Collect an end snapshot next

    RP/0/RP0/CPU0:CRS#show memory compare end  
    Successfully stored memory snapshot /harddisk:/malloc_dump/memcmp_end.out
    
    (*) An end snapshot file will be created under the hard disk as shown above.
    

    Compare two start/end snapshots

    RP/0/RP0/CPU0:CRS#show memory compare report
    JID   name                 mem before   mem after    difference mallocs restart/exit/new
    ---   ----                 ----------   ---------    ---------- ------- ----------------
    117   arp                  662916       663040       124        2         
    65819 exec                 257480       257568       88         3         
    415   wdsysmon             4190296      4189240      -1056      -4         
     
    You are now free to remove snapshot memcmp_start.out and memcmp_end.out under /harddisk:/malloc_dump
    
    (*) Displays the difference between start/end snapshots created under the hard disk.
    

    ItemExecutive Summary

    JIDJob ID assigned to a process
    nameProcess name
    mem beforeDynamic memory usage in a start snapshot (Bytes)
    mem afterDynamic memory usage in an end snapshot (Bytes)
    differenceIncrease/decrease of the dynamic memory usage between start and end snapshots (Bytes)
    mallocsThe difference between the number of memory allocation API calls between start and end snapshots
    restart/exit/newA flag indicating whether the restart/exit/start process was executed between start and end snapshots

     

    Monitoring by show processes memory command

    Check If the dynamic memory usage per process is increasing continuously by collecting snapshots periodically via show process memory command.

    RP/0/RSP0/CPU0:ASR9K#show processes memory location 0/RSP0/CPU0
    JID          Text       Data      Stack    Dynamic Process
    ------ ---------- ---------- ---------- ---------- -------
    340        167936     110592     184320   22089728 parser_server
    241        589824     970752      81920   14663680 iedged
    1110       401408     131072     122880   13197312 ipv6_rib
    1109       405504     131072     159744   12632064 ipv4_rib
    1040      1056768    2867200     253952   10731520 bgp
    1105      2396160   10485760     135168    8548352 l2vpn_mgr
    60         110592      53248     139264    7630848 eth_server
    1133        65536      20480      32768    7479296 schema_server
    334        143360      69632      53248    7417856 nvgen_server
    1011      1331200     598016     135168    6258688 ospf
    -- snip -- 
    (*) Displays in descending order of the dynamic memory usage volume. Check to see if Dynamic is increasing continually.
    

    How to identify memory leak blocks in a process

    After a memory leak process is identified, the next step is to identify the memory leak blocks in a process.

    Collect process and DLL mapping information for a memory leak process first.

    • Collect process information
    • Collect process DLL mapping information

     

    Collect a memory snapshot a couple of times and compare the difference among those snapshots.

    1. Collect a start snapshot of the heap area
    2. Wait for a while (according to how fast and how much the memory usage increases)
    3. Collect an end snapshot of the heap area

     

    Collect a process running coredump.

    • Collect the process coredump files

     

    Collect process information

    Check the process information of a memory leak process using show processes command.

    RP/0/RSP0/CPU0:ASR9K#show processes snmpd
                      Job Id: 1113
                         PID: 844710071
             Executable path: /disk0/iosxr-infra-4.1.2/bin/snmpd
                  Instance #: 1
                  Version ID: 00.00.0000
                     Respawn: ON
               Respawn count: 2
      Max. spawns per minute: 12
                         -- snip --
    

    Process ID (PID) can be changed each time a process is restarted, but Job ID (JID) remains the same even after process restart.

    The above Job ID will be referred to <jid> in the command examples hereinafter.

     

    Collect DLL mapping information

    Collect DLL mapping information of a memory leak process using show dll command.

    RP/0/RSP0/CPU0:ASR9K#show dll <jid>
    

     

    Collect a snapshot of heap area in a process

    Collect the following three memory snapshots of process heap.

    • show memory heap dllname command
    • show memory heap allocated dllname command
    • show memory command

     

    show memory heap dllname command

    Collect a snapshot of heap summary in a process context.

    RP/0/RSP0/CPU0:ASR9K#show memory heap dllname <jid>
    Malloc summary for pid 844710071 process snmpd:
      Heapsize 11325440: allocd 10365240, free 27120, overhead 933080, high watermark 11325440
      Calls: mallocs 95070437; reallocs 964756; frees 94961917; [core-allocs 192; core-frees 7]
    -- snip --
    Block Allocated List:
    Total      Total      Block      Name/ID/Caller
    Usize      Size       Count
    0x003d2ba8 0x0040cc08 0x00007325  0x40020fd4 <N/A>
    0x0022118f 0x0026b558 0x00007325  0x40020fe8 <N/A>
    0x00100270 0x001002c8 0x00000009  GSP_MEM
    0x00051860 0x00059ca0 0x0000104e  0x400283dc <N/A>
    0x00046374 0x000535b0 0x0000161c  0x4e894c58 snmp_object_create_list
    0x000212a0 0x0002c380 0x0000161c  0x4e894ccc snmp_object_create_list
    0x0001e000 0x0001e008 0x00000001  0x4c5de600 chk_messaging_init_msgbuf
    0x0001e000 0x0001e008 0x00000001  0x4c5de5b0 chk_messaging_init_msgbuf
    0x00018316 0x000323d8 0x0000216f  0x4c2f6178 strdup
    0x00015acf 0x000210e8 0x0000108d  0x4006c8a8 <N/A>
    -- snip --
    

     

    show memory heap allocated dllname command

    Collect a snapshot of heap blocks in a process context.

    RP/0/RSP0/CPU0:ASR9K#show memory heap allocated dllname <jid>
    Block Allocated List:
    Usize      Size       Address    Name/ID/Caller
    0x0003fff0 0x0003fff8 0x501b6024  GSP_MEM
    0x0003fff0 0x0003fff8 0x50359024  GSP_MEM
    0x0003fff0 0x0003fff8 0x5042a024  GSP_MEM
    0x0003fff0 0x0003fff8 0x504b3024  GSP_MEM
    0x0001e000 0x0001e008 0x50080024  0x4c5de600 chk_messaging_init_msgbuf
    0x0001e000 0x0001e008 0x5009f024  0x4c5de5b0 chk_messaging_init_msgbuf
    0x00008400 0x00008408 0x50043aec  0x4c5dee70 chk_messaging_init
    0x00007a58 0x00007a60 0x501f7024 [chunk NH RULES CHUNK hdr]: 304 bytes, 100 per blk (a=0/f=100/h=0) (blks 0, 0)][NH RULES CHUNK chunk data: 0x501f7078 free: 100]
    0x00005a44 0x00005a4c 0x502d1024  0x4e8948f4 snmp_object_add_object
    0x0000413c 0x00004144 0x506c4024  0x4007f12c <N/A>
    -- snip --
    0x00000088 0x000000a0 0x500d25c4  0x40020fd4 <N/A>
    0x00000088 0x000000a0 0x50199504  0x40020fd4 <N/A>
    0x00000088 0x000000a0 0x50277038  0x40020fd4 <N/A>
    0x00000088 0x000000a0 0x50278e04  0x40020fd4 <N/A>
    0x00000088 0x000000a0 0x5027b304  0x40020fd4 <N/A>
    0x00000088 0x000000a0 0x503407d8  0x40020fd4 <N/A>
    -- snip --
    0x00000050 0x00000058 0x50706424  0x40020fe8 <N/A>
    0x0000004d 0x00000068 0x50199284  0x40020fe8 <N/A>
    0x0000004d 0x00000068 0x5023dab4  0x40020fe8 <N/A>
    0x0000004d 0x00000068 0x5027887c  0x40020fe8 <N/A>
    0x0000004d 0x00000068 0x5027b914  0x40020fe8 <N/A>
    -- snip --
    

     

    show memory command

    Collect memory summary in a process context.

    RP/0/RSP0/CPU0:ASR9K#show memory <jid>
    Physical Memory: 4096M total (1988M available)
    Application Memory : 3915M (1988M available)
    Image: 51M (bootram: 51M)
    Reserved: 128M, IOMem: 2028M, flashfsys: 0
    -- snip --
    pkg/bin/snmpd: jid 1113
    Address         Bytes           What
    40000000        557056          Program Text
    4c000000        67108864        DLL Text
    50000000        159744          Program/DLL Data
    50027000        4096            Program/DLL Data
    50028000        1961984         Allocated Memory
    50207000        36864           Physical Mapped Memory
    50210000        131072          Allocated Memory
    50237000        2011136         Allocated Memory
    50429000        7254016         Allocated Memory
    50b1c000        98304           Allocated Memory
    50b39000        1277952         Allocated Memory
    60100000        212992          Shared Memory
    60134000        98304           Shared Memory
    6014c000        40960           Shared Memory
    -- snip --
    ec51a000        4096            Program/DLL Data
    ec745000        8192            Program/DLL Data
    ec748000        8192            Program/DLL Data
    ec74b000        20480           Program/DLL Data
    ec751000        4096            Program/DLL Data
    Total Allocated Memory: 12734464
    Total Shared Memory: 844582912
    
    

     

    Compare snapshots of heap area in a process

    You can check if the number of allocated memory blocks has increased/decreased by comparing start/end snapshots of show memory heap dllname.

    You can see the increase in the following example (in red).

    Start snapshot

    RP/0/RSP0/CPU0:ASR9K#show memory heap dllname <jid>
    Malloc summary for pid 844710071 process snmpd:
      Heapsize 11325440: allocd 10365240, free 27120, overhead 933080, high watermark 11325440
      Calls: mallocs 95070437; reallocs 964756; frees 94961917; [core-allocs 192; core-frees 7]
    -- snip --
    Block Allocated List:
    Total      Total      Block      Name/ID/Caller
    Usize      Size       Count
    0x003d2ba8 0x0040cc08 0x00007325  0x40020fd4 <N/A>
    0x0022118f 0x0026b558 0x00007325  0x40020fe8 <N/A>
    0x00100270 0x001002c8 0x00000009  GSP_MEM
    0x00051860 0x00059ca0 0x0000104e  0x400283dc <N/A>
    0x00046374 0x000535b0 0x0000161c  0x4e894c58 snmp_object_create_list
    0x000212a0 0x0002c380 0x0000161c  0x4e894ccc snmp_object_create_list
    -- snip --
    

    End snapshot

    RP/0/RSP0/CPU0:ASR9K#show memory heap dllname <jid>
    Malloc summary for pid 844710071 process snmpd:
      Heapsize 13783040: allocd 12554380, free 47572, overhead 1181088, high watermark 13783040
      Calls: mallocs 122449585; reallocs 1248003; frees 122315443; [core-allocs 247; core-frees 20]
    -- snip --
    Block Allocated List:
    Total      Total      Block      Name/ID/Caller
    Usize      Size       Count
    0x004f19a8 0x0053c9dc 0x000094e5  0x40020fd4 <N/A>
    0x002c1237 0x003211d8 0x000094e5  0x40020fe8 <N/A>
    0x00100270 0x001002c8 0x00000009  GSP_MEM
    0x00051860 0x00059ca0 0x0000104e  0x400283dc <N/A>
    0x00046374 0x000535b0 0x0000161c  0x4e894c58 snmp_object_create_list
    0x0002fa12 0x00063210 0x0000420b  0x4c2f6178 strdup
    0x000212a0 0x0002c380 0x0000161c  0x4e894ccc snmp_object_create_list
    -- snip --
    
    

     

    Collect process running coredump

    You may be able to identify the cause of memory leak by comparing the Start and End snapshots, but you may need to collect process running coredump in some cases.

    Collect process running coredump

    RP/0/RSP0/CPU0:ASR9K#dumpcore running <jid>
    

    Run the above command and then collect the following coredump files:

    harddisk:/dumper/first.snmpd_<jid>.by.dumper_gen.node0_RSP0_CPU0.ppc.Z
    harddisk:/dumper/first.snmpd_<jid>.by.dumper_gen.node0_RSP0_CPU0.ppc.txt
    harddisk:/dumper/first.snmpd_<jid>.by.dumper_gen.node0_RSP0_CPU0.ppc.cpu_info.Z
    

    The name of the running coredump files includes by.dumper_gen, which is different from the case of process crash.

    Related Information

    Original Document: https://supportforums.cisco.com/ja/document/11935591
    Author: Taisuke Sasaki
    Posted on October 1, 2013

    Comments
    Beginner

    I came across the same issue with sshd_child_handler after upgrading ASR9001 to IOS XR Software, Version 6.3.3. Still not sure how to fix this. Will be escalating  to the TAC.

    CreatePlease to create content
    Content for Community-Ad
    June's Community Spotlight Awards