cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4079
Views
20
Helpful
4
Replies

CSCvd45973 - Catalyst 3850/3650 - memory leak in platform_mgr process - 8

Hi Have the same problem on 3850 with XE Everest 16.6.2

2021-05-05 00:01:13 Local3.Error 10.137.245.6 6701: *May 5 00:11:52.229 Italy: %PLATFORM-3-ELEMENT_CRITICAL:Switch 1 R0/0: smand: 1/RP/0: Used Memory value 98% exceeds critical level 95%

1 Accepted Solution

Accepted Solutions


@giovannibardetta wrote:
Memory (kB)
Slot  Status   Total   Used (Pct)    Free (Pct)  Committed (Pct)
1-RP0 Critical 3983076 3892612 (98%) 90464 ( 2%) 5042224 (127%)

Confirm there is a memory leak.  I do not know how long before the memory pool runs out but it is not long to go.  Within the next 3 days, is my guess, and this switch will crash. 


@giovannibardetta wrote:
31653 177098 721968 136 120 721968 2424144 linux_iosd-imag
19285 116 139336 136 70708 139336 2371708 fed main event
20814 303 55560 136 2532 55560 1402192 sif_mgr
19666 1007 56932 136 2276 56932 1130044 platform_mgr

These four (highlighted in blue) are the "usual suspect".  Only TAC will know how to decipher what is causing the process "linux_iosd-imag" to be chomping up so much memory.  

So here are the following options: 

  1. Raise a TAC Case and let them pinpoint the cause of the memory leak.  
  2. Reboot the switch. 
  3. Upgrade the firmware to something more recent. 

Either one of the options will not be good because Cisco has been been unable to fix a plethora of memory and CPU issues affecting IOS-XE.  

If the option is #2, it would be highly recommendable if the switch can be COLD reboot.  Cold reboot means pulling the power cable(s) and not issue the "reload" command.  Otherwise, if it is not possible to perform a cold reboot then a reload is also good.  Option #2 is not a good long term solution because this memory leak will eventually return. 

Option #3 is one of the worst because, like I said before, there are a lot of memory and CPU affecting IOS-XE and I have no idea what monsters will appear with an upgraded firmware. 

Along with Option #3, Option #1 is one of the worst.  By the time TAC starts to troubleshoot the memory leak, the switch would've crashed.  One of the bugs prevalent on IOS-XE is the (in)ability to generate a crashlog or a coredump before a crash.  So there is a strong chance that this switch will not generate either a crashlog or a coredump for TAC to decipher.  Another thing, when TAC starts to troubleshoot the switch, they will see that the switch is running a fairly old IOS version, recommend for an IOS upgrade and then close the TAC Case.  

If it was up to me, I would be pick Option #2 and a COLD reboot.  

Hope this helps.

View solution in original post

4 Replies 4

balaji.bandi
Hall of Fame
Hall of Fame

try 16.9.x version when you get maintenance window.

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Leo Laohoo
Hall of Fame
Hall of Fame

@giovannibardetta wrote:

Used Memory value 98%


Post the complete output to the following command: 

sh platform software status con brief
sh proc memory platform sort location switch 1 r0

I want to find out where the memory leak is coming from.


xxxxxxxx#sh platform software status con brief
Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 0.22 0.23 0.23

Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Critical 3983076 3892612 (98%) 90464 ( 2%) 5042224 (127%)

CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 2.40 0.90 0.00 96.70 0.00 0.00 0.00
1 1.80 0.90 0.00 97.19 0.00 0.00 0.10
2 4.69 0.89 0.00 94.20 0.00 0.09 0.09
3 3.60 0.80 0.00 95.50 0.00 0.00 0.10

SW1.RTDGUPI2#
SW1.RTDGUPI2#
SW1.RTDGUPI2#sh proc memory platform sort location switch 1 r0
System memory: 3983076K total, 3843320K used, 139756K free,
Lowest: 139756K
Pid Text Data Stack Dynamic RSS Total Name
--------------------------------------------------------------------------------
31653 177098 721968 136 120 721968 2424144 linux_iosd-imag
19285 116 139336 136 70708 139336 2371708 fed main event
20814 303 55560 136 2532 55560 1402192 sif_mgr
19666 1007 56932 136 2276 56932 1130044 platform_mgr
29881 287 47908 136 7628 47908 970768 cli_agent
2238 649 85732 136 22416 85732 866996 smand
30688 170 48208 136 3596 48208 864928 dbm
28482 8817 48388 136 7552 48388 823704 fman_fp_image
2468 127 55928 0 268 55928 773596 smd
30957 8461 39060 136 2208 39060 729148 fman_rp
1409 430 30716 136 1288 30716 726652 repm
3300 251 30664 136 6392 30664 711852 tms
13873 38 26728 136 1340 26728 710040 bt_logger
15008 519 28616 136 2072 28616 704956 hman
16308 147 25728 136 2420 25728 703324 lman
864 40 25032 136 1076 25032 701376 psd
18903 198 22328 136 400 22328 700900 nif_mgr
21302 416 25352 136 1660 25352 700768 stack_mgr
30118 114 23712 136 1084 23712 700268 cmm
620 45 24488 136 1820 24488 697892 plogd
18750 62 23772 136 1020 23772 697660 epc_ws_liaison
15910 74 22508 136 432 22508 692260 keyman
3512 604 2236 132 132 2236 212376 libvirtd
3477 753 1608 132 132 1608 17176 virtlogd
13579 7 2956 136 148 2956 16588 auto_upgrade_se
88 314 2900 132 132 2900 12432 systemd-journal
16647 974 7488 136 5776 7488 10120 ncd.sh
25322 974 6452 136 5644 6452 9988 issu_stack.sh
25313 974 6364 136 5644 6364 9988 issu_stack.sh
15381 974 7372 136 5644 7372 9988 issu_stack.sh
13099 974 7216 136 5516 7216 9860 auto_upgrade_cl
19254 974 6184 136 4520 6184 8864 periodic.sh
31280 7 1948 136 148 1948 8712 rotee
30573 7 1864 136 148 1864 8712 rotee
30040 7 1848 136 148 1848 8712 rotee
29320 7 1848 136 148 1848 8712 rotee
28940 7 1848 136 148 1848 8712 rotee
27917 7 1848 136 148 1848 8712 rotee
20920 7 1848 136 148 1848 8712 rotee
19902 7 1908 136 148 1908 8712 rotee
19112 7 1860 136 148 1860 8712 rotee
18918 7 1948 136 148 1948 8712 rotee
18392 7 1856 136 148 1856 8712 rotee
18257 7 1856 136 148 1856 8712 rotee
17242 7 1848 136 148 1848 8712 rotee
16483 7 1852 136 148 1852 8712 rotee
16080 7 1856 136 148 1856 8712 rotee
15572 7 1848 136 148 1848 8712 rotee
14847 7 1852 136 148 1852 8712 rotee
14666 7 1968 136 148 1968 8712 rotee
14366 7 1848 136 148 1848 8712 rotee
13607 7 1852 136 148 1852 8712 rotee
2929 7 1848 136 148 1848 8712 rotee
1611 7 1856 136 148 1856 8712 rotee
1035 7 1852 136 148 1852 8712 rotee
16347 7 1868 132 148 1868 8708 rotee
12085 7 1864 132 148 1864 8708 rotee
11955 7 1848 132 148 1848 8708 rotee
32715 7 1852 136 148 1852 8592 rotee
32386 7 1848 136 148 1848 8592 rotee
20318 7 1860 136 148 1860 8592 rotee
19869 7 1860 136 148 1860 8592 rotee
18631 7 1848 136 148 1848 8592 rotee
17799 7 1848 136 148 1848 8592 rotee
17612 7 1848 136 148 1848 8592 rotee
15190 7 1848 136 148 1848 8592 rotee
14126 7 1856 136 148 1856 8592 rotee
13209 7 1848 136 148 1848 8592 rotee
12856 7 1852 136 148 1852 8592 rotee
12540 7 1860 136 148 1860 8592 rotee
1939 7 1852 136 148 1852 8592 rotee
11295 7 1844 132 148 1844 8588 rotee
4389 7 1844 132 148 1844 8588 rotee
4047 7 1864 132 148 1864 8588 rotee
4010 7 1864 132 148 1864 8588 rotee
3882 7 1844 132 148 1844 8588 rotee
1 1397 4296 132 1344 4296 7920 systemd
10614 974 5188 132 3476 5188 7816 rollback_timer.
11967 974 4408 132 2648 4408 6988 pvp.sh
3479 974 4120 132 2452 4120 6792 reflector.sh
3484 974 4092 132 2432 4092 6772 droputil.sh
14099 974 4060 136 2420 4060 6764 btrace_rotate.s
11844 974 4052 132 2424 4052 6764 psvp.sh
17822 974 4040 136 2408 4040 6752 btrace_rotate.s
15718 974 3976 132 2288 3976 6628 pvp.sh
3490 49 1076 132 132 1076 6180 rpcbind
19485 974 3308 136 1644 3308 5988 pman.sh
17593 974 3372 136 1644 3372 5988 pman.sh
16882 974 3304 136 1644 3304 5988 pman.sh
12703 974 3308 136 1644 3308 5988 pman.sh
12437 974 3304 136 1644 3304 5988 pman.sh
1298 974 3312 136 1644 3312 5988 pman.sh
32086 974 3312 136 1640 3312 5984 pman.sh
31037 974 3312 136 1640 3312 5984 pman.sh
2664 974 3312 136 1640 3312 5984 pman.sh
752 974 3312 136 1640 3312 5984 pman.sh
29458 974 3308 136 1636 3308 5980 pman.sh
28808 974 3308 136 1636 3308 5980 pman.sh
20380 974 3308 136 1636 3308 5980 pman.sh
18366 974 3308 136 1636 3308 5980 pman.sh
15111 974 3308 136 1636 3308 5980 pman.sh
14767 974 3308 136 1636 3308 5980 pman.sh
14413 974 3372 136 1636 3372 5980 pman.sh
13886 974 3308 136 1636 3308 5980 pman.sh
13064 974 3376 136 1636 3376 5980 pman.sh
12257 974 3372 136 1636 3372 5980 pman.sh
32290 974 3308 136 1632 3308 5976 pman.sh
30130 974 3308 136 1632 3308 5976 pman.sh
28414 974 3308 136 1632 3308 5976 pman.sh
19771 974 3372 136 1632 3372 5976 pman.sh
15514 974 3372 136 1632 3372 5976 pman.sh
1518 974 3304 136 1632 3304 5976 pman.sh
27425 974 3084 136 1504 3084 5848 pman.sh
16979 974 3084 136 1504 3084 5848 pman.sh
17378 974 3084 132 1504 3084 5844 pman.sh
16687 974 3148 132 1504 3148 5844 pman.sh
4086 974 2904 132 1204 2904 5544 iptbl.sh
102 417 1936 132 264 1936 5480 systemd-udevd
3480 28 3152 132 132 3152 5148 klogd
20858 974 2212 136 644 2212 4988 sort_files_by_i
3498 974 2212 132 528 2212 4868 oom.sh
27427 974 1704 136 132 1704 4476 stack_sntp.sh
3534 974 936 132 132 936 4472 boothelper_evt.
3482 974 1540 132 132 1540 4472 libvirtd.sh
27450 58 816 136 132 816 4264 sntp
3600 109 944 132 132 944 4060 rpc.mountd
3540 77 1456 132 132 1456 3848 rpc.statd
10616 170 968 132 132 968 3376 xinetd
10610 170 952 132 132 952 3376 xinetd
5497 170 904 132 132 904 3376 xinetd
3494 170 844 132 132 844 3376 xinetd
25366 22 700 136 132 700 2784 inotifywait
25329 22 700 136 132 700 2784 inotifywait
25320 22 684 136 132 684 2784 inotifywait
25312 22 756 136 132 756 2784 inotifywait
19516 22 640 136 132 640 2784 inotifywait
19440 22 644 136 132 644 2784 inotifywait
4062 22 700 136 132 700 2784 inotifywait
25498 22 728 132 132 728 2780 inotifywait
16431 22 696 132 132 696 2780 inotifywait
12156 22 696 132 132 696 2780 inotifywait
11962 22 696 132 132 696 2780 inotifywait
11309 22 688 132 132 688 2780 inotifywait
4412 22 696 132 132 696 2780 inotifywait
4072 22 720 132 132 720 2780 inotifywait
3545 22 652 132 132 652 2780 inotifywait
19580 43 596 136 132 596 2624 sleep
19344 43 596 136 132 596 2624 sleep
3904 43 572 132 132 572 2616 sleep


@giovannibardetta wrote:
Memory (kB)
Slot  Status   Total   Used (Pct)    Free (Pct)  Committed (Pct)
1-RP0 Critical 3983076 3892612 (98%) 90464 ( 2%) 5042224 (127%)

Confirm there is a memory leak.  I do not know how long before the memory pool runs out but it is not long to go.  Within the next 3 days, is my guess, and this switch will crash. 


@giovannibardetta wrote:
31653 177098 721968 136 120 721968 2424144 linux_iosd-imag
19285 116 139336 136 70708 139336 2371708 fed main event
20814 303 55560 136 2532 55560 1402192 sif_mgr
19666 1007 56932 136 2276 56932 1130044 platform_mgr

These four (highlighted in blue) are the "usual suspect".  Only TAC will know how to decipher what is causing the process "linux_iosd-imag" to be chomping up so much memory.  

So here are the following options: 

  1. Raise a TAC Case and let them pinpoint the cause of the memory leak.  
  2. Reboot the switch. 
  3. Upgrade the firmware to something more recent. 

Either one of the options will not be good because Cisco has been been unable to fix a plethora of memory and CPU issues affecting IOS-XE.  

If the option is #2, it would be highly recommendable if the switch can be COLD reboot.  Cold reboot means pulling the power cable(s) and not issue the "reload" command.  Otherwise, if it is not possible to perform a cold reboot then a reload is also good.  Option #2 is not a good long term solution because this memory leak will eventually return. 

Option #3 is one of the worst because, like I said before, there are a lot of memory and CPU affecting IOS-XE and I have no idea what monsters will appear with an upgraded firmware. 

Along with Option #3, Option #1 is one of the worst.  By the time TAC starts to troubleshoot the memory leak, the switch would've crashed.  One of the bugs prevalent on IOS-XE is the (in)ability to generate a crashlog or a coredump before a crash.  So there is a strong chance that this switch will not generate either a crashlog or a coredump for TAC to decipher.  Another thing, when TAC starts to troubleshoot the switch, they will see that the switch is running a fairly old IOS version, recommend for an IOS upgrade and then close the TAC Case.  

If it was up to me, I would be pick Option #2 and a COLD reboot.  

Hope this helps.