cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
18644
Views
0
Helpful
26
Replies

Device experiencing high memory utilization 9300 switch

We are receiving notification on DNA that one of our switches utilizes high memory. Anyone can suggest to me what could be the reason. 

Watch Model: 9300-48 with stack and only 832 only free please see the screenshot. Thanks

26 Replies 26

Andrew9003
Level 1
Level 1

i have the same scenario with a stack of 2x9300-24P running on 17.6.3 where switch 1 the active one has high memory utilization:
#sh processes memory platform sorted location switch 1 r0
System memory: 7750576K total, 7053748K used, 696828K free,
Lowest: 679132K
Pid Text Data Stack Dynamic RSS Name
----------------------------------------------------------------------
2507 87 3734740 136 3619496 3734740 pubd
Yes there is a DNAC appliance

 

The choices are: 

1.  Reboot the switch

2.  Disable DNAC telemetry: 

 

conf t
 no nmsp enable
end

 

Please note, "upgrade the firmware" is not a solution but an excuse.  Every IOS-XE memory leaks like a hydrant.  If the DNAC does not memory leak in one specific version, another process might. 

Hi Leo , can i ask why to disable DNAC? the device is provisioned via DNAC


@Andrew9003 wrote:
why to disable DNAC

I meant disable the "telemetry" between the switch and DNAC.  

Why?  Because DNAC &/or DNA Spaces are very well known to cause memory leaks.  

Andrew9003
Level 1
Level 1

Thank you for the explanation, will try out in a maintenance window because it involves reboot.

shambhu.kumar
Spotlight
Spotlight

Please check BUG ID " CSCwe09745" This may help.

Symptom: A catalyst switch, which is managed by DNAC, may exhibit a memory leak within the Pubd process if the switch is not able to connect with Telemetry to DNAC.


The switch needs to be managed by DNAC and have the tls-native protocol configured along with some telemetry subscriptions. The leak will occur when the switch attempts to connect to DNAC but is unsuccessful. The state will be "connecting", as verified via the below commands:

IOS-XE 17.6 and Earlier:

CAT9300#show telemetry internal connection
Telemetry connections

Index Peer Address Port VRF Source Address State
----- -------------------------- ----- --- -------------------------- ----------
9825 X.X.X.X. 25103 0 Y.Y.Y.Y Connecting

IOS-XE 17.7 and Later:

CAT9300#show telemetry connection all
Telemetry connections

Index Peer Address Port VRF Source Address State State Description
----- -------------------------- ----- --- -------------------------- ---------- --------------------
9825 X.X.X.X 25103 0 Y.Y.Y.Y Connecting Connection request made to transport handler

Workaround: 1. Disable telemetry on the switch, and troubleshoot why telemetry is not able to successfully connect to DNAC.

OR

2. From DNAC side, execute the following: * Go to the Cisco DNA Center GUI > Provision > Inventory . * Select the affected device and go to Actions > Telemetry > Update Telemetry Settings. * Select the option for Force Configuration Push .

Regards

Shambhu Kumar

 

Hey Kumar, yes exact same scenario when checking via cli with the command #show telemetry internal connection.

Will try workaround nr 2 and see what happens. Thx

Andrew9003
Level 1
Level 1

After doing the steps in DNAC , before it was showing Connecting now is Active:

#show telemetry internal connection
Telemetry connections

Index Peer Address Port VRF Source Address State
----- -------------------------- ----- --- -------------------------- ----------
22684 10.107.x.y 25103 0 10.119.x.y Active

I hope it will bring normal memory leak within the Pubd process.

Regards

Shambhu Kumar

 

I hope so too. Nothing much yet change on:

#sh processes memory platform sorted location switch 1 r0
System memory: 7750576K total, 7061900K used, 688676K free,
Lowest: 662872K
Pid Text Data Stack Dynamic RSS Name
----------------------------------------------------------------------
2507 87 3743568 136 3625548 3743568 pubd
5972 256726 840496 136 484 840496 linux_iosd-imag
16343 203 379124 136 77720 379124 fed main event

I guess would require a reboot.


@Andrew9003 wrote:

System memory: 7750576K total, 7061900K used, 688676K free,
Lowest: 662872K
Pid Text Data Stack Dynamic RSS Name
----------------------------------------------------------------------
2507 87 3743568 136 3625548 3743568 pubd
5972 256726 840496 136 484 840496 linux_iosd-imag
16343 203 379124 136 77720 379124 fed main event


1.  NOTHING should be "above" linux_iosd-imag process.  Anything above linux_iosd-imag process is bad. 

2.  Regularly check the control-plane memory & CPU utilization.  When I say "regularly", I really mean DAILY.  Use the following commands: 

sh platform resources
sh platform software status control-processor brief

NOTE:  Memory utilization of <40% is normal.  Memory utilization is >50% is an irrefutable sign of a memory leak.

3.  If the switches are in a stack/VSS, the higher the chances of a memory leak with the "stackmgr" process.  

4.  Regular reboot/cold-reboot is the best workaround.  Make it a point to reboot/cold-reboot the stack every 12 months.  


@Andrew9003 wrote:
2507     87 3743568 136 3625548 3743568 pubd
5972 256726  840496 136     484  840496 linux_iosd-imag

Disabling telemetry will not bring down the "stuck" memory and it will remain at that level until the controller crashes or reboots.  

Disabling telemetry, however, will "stop" (or minimize the rate of climb) the memory leak.