11-14-2011 12:25 AM - edited 03-07-2019 03:22 AM
Can someone tell me which process is consuming high memory with my c6509 switch. File attached for ref.
Solved! Go to Solution.
11-18-2011 06:39 AM
Hi Samarjit,
So smth is wrong with registers - we forced those to crrect values and saw problem gone for a while. Then they failed back to incorrect once. I would try to physically reseat the card and see if it helps. Otherwise it can be HW problem or a new DDTS - then you'll need to open a TAC case to deeg it deeper.
Nik
11-14-2011 05:20 AM
Hi ,
Can you pls start with "show proc mem" and "show mem sum" plus "show ver" to understand the details on the node. Output attached is quite hard to read without knowing background.
Nik
11-14-2011 10:21 PM
11-15-2011 01:47 AM
Thanks,
I see most memory taken by ios-base. That is comulative systme process.
Can you get
show process memory detail ios-base
show process memory detail ios-base taskid
show memory detailed ios-base dead
to check it.
Nik
11-15-2011 02:02 AM
11-15-2011 03:37 AM
Ok,
So answering to your initial question:
Can someone tell me which process is consuming high memory with my c6509 switch.
Most of the memory held by ios-base which is main cumulative process. You see what are the sub-processes in it.
sh processes memory detailed ios-base
System Memory : 524288K total, 351957K used, 172331K free, 1000K kernel reserved
Lowest(b) : 176058368
Process sbin/ios-base, type IOS, PID = 16407
163756K total, 87504K text, 4K data, 96K stack, 76152K dynamic
Heap : 80733024 total, 76857816 used, 3875208 free
Task TTY Allocated Freed Holding Getbufs Retbufs TaskName
0 0 79395328 1752 79056968 0 0 *Init*
0 0 1293906040 1232002068 8458308 6781852 0 *Dead*
229 0 1113240 56175112 848392 0 0 FM core
14 0 76093160 75601240 492792 62843372 62843372 Pool Manager
37 0 1585208 595648 453752 0 0 IPC Seat Manage
49 0 3017944 2607832 352312 0 0 rf proxy rp age
326 0 1167632 777256 344048 0 0 RPC pagp_switch
27 0 692144 372488 316008 0 0 Entity MIB API
201 0 5891384 5571680 297824 0 0 ARP HA
39 0 268560 624 268808 120600 0 EEM ED Syslog
344 0 384282544 383994632 254624 0 0 Port manager pe
117 0 440448 245248 176896 0 0 PF_Init Process
4 0 1577055416 1576708056 151024 0 0 Service Task
121 0 1705624 1225664 145248 0 0 CHKPT rcv MSG
221 0 218360 109680 135544 0 0 XDR mcast
404 0 20331912 20075512 125896 0 0 SNMP ConfCopyPr
363 0 243040 55136 117432 0 0 Entity MIB C6k
5 0 988752 918200 115592 0 0 Service Task
If you want to spread it further you can do
show process memory detail ios-base taskid
for each particular process.
Please let me know if you have any further queries.
Nik
11-15-2011 04:38 AM
Hi Nik
Thanks for your thorough analysis. What i can see is that task name listed as "Init" is consuming lots of memory and not releasing memory as compared to other process. So is there any workaround which can reduce memory consumption.
Actually I was troubleshooting a problem with Cisco 6509 switch and found the memory high for this process.Initial problem started by putting automatically port up/down state of a particular module without any reason. Later sup engine(WS-SUP720-3B) started restarting automatically. I reset the SUP engine but problem not resolved. Just now when I logged into the module(WS-X6704-10GE) where intially port oscillating between up and down, I saw CPU utilization of that module is running 100%.I have attached tech-support of the module for your ref.
11-15-2011 07:25 AM
Hi Samarjit,
Init process is responsible for ION initialization and other system processes to tun your OS so that is getting it's memory from begining and usually holding same throughout operation. Also you have enough free memory so that should not be an issue unless that is leaking continuously:
------------------ show process memory detailed ------------------
System Memory : 262144K total, 154525K used, 107619K free, 1000K kernel reserved
Regarding the High CPU - it is traffic driven. I mean some traffic punted to CPU:
------------------ show process cpu detailed ------------------
CPU utilization for five seconds: 100%; one minute: 100%; five minutes: 100%
12307 99.7% 98.9% 98.5% ios-base 3h55m
1 0.1% 0.2% 0.2% 22 Intr 47.972
2 0.6% 0.6% 0.6% 5 Ready 79.937
3 0.3% 0.1% 0.2% 10 Receive 4.068
4 0.0% 0.0% 0.0% 10 Receive 11.435
5 0.0% 0.0% 0.0% 11 Nanosleep 0.632
6 94.9% 96.4% 96.4% 22 Intr 3h48m
Process sbin/ios-base, type IOS, PID = 12307
CPU utilization for five seconds: 0%/99%; one minute: 0%; five minutes: 0%
12307 99.7% 98.9% 98.5% ios-base 3h55m
*SNIP*
6 94.9% 96.4% 96.4% 22 Intr 3h48m
Process sbin/ios-base, type IOS, PID = 12307
CPU utilization for five seconds: 0%/99%; one minute: 0%; five minutes: 0%
/99% - means interrupts which are CPU handling particular packets.
Please check if any of your interfaces have following:
- High broadcast rate
- Input drops
You can do following debug (safe to run in production) to see what packets are punted to CPU:
debug netdr cap rx
show netdr cap
One more thing regarding crashes - those seem to be reset due to lost power:
System returned to ROM by power-on
I would also advise to check if everything is good with power sources and environment in server room - temperature, etc.
Nik
11-16-2011 02:59 AM
Hi Nik
Thanks for your great help. You have provided fantastic analysis on case data as well as provided solid troubleshooting step. I ran debug command suggested by you and caught huge broadcast generated by one server in data center. After filtering those broadcasts at closet to source , CPU utilization of SUP engine fall down but still module WS-X6704-10GE for which I shared tech-support yesterday undergoing high CPU process ( 100% all times).Can you please suggest me some addtional troubleshooting steps for module WS-X6704-10GE.
Only two ports of module WS-X6704-10GE are being used for connectivity between Primary & secondary switches.
11-16-2011 06:08 AM
Hi Samarjit,
Can you please attach following commands from those ports (let's call port GiX/Y)
show int GiX/y ---3 times
show buffer input-int gix/y ---- 3 times
show counter int gixy ---- 2 times
show queueing int gix/y --- 2 times
show int gix/y count err --- 3 times
show int gix/y switching --- 3 times
The CPU on card is also High due to traffic - please check if any ACL configured on these ports e.g. with log option.
Nik
11-16-2011 10:29 AM
11-16-2011 07:02 PM
Thanks Samarjit,
Thos look good. I suspect smth else. Can you please get follwoing logs:
From SUP:
show platform hardware capacity
on the line card:
show stack
sh platform netint
show platform hardware gemini interrupts - few times withen 5 minutes
Nik
11-16-2011 10:52 PM
11-17-2011 07:13 PM
Thanks Samarjit,
I see gemini interrupts which I suspected.
E.G.
Interrupt stats on Module 1, Unit 1 - Ports 1, 2 :
Int=tc_int , num=0 Int=ed_int , num=2173331
Int=er_int , num=2173331 Int=nf_int , num=0
There is one DDTS for CFC line cards when these interrupts appear erroneously cause High CPU, so here on DFC may be smth similar or same Gemini register is corrupted.
You can try following workaround sugested for CFC cards:
enter the following commands in enable mode:
remote login module
show platform hardware gemini poke 0x5 0x18
Please do it on Maintenance Window - DE advise that it is harmless but as I said that was for CFC so additional caution should be taken in our case. If did not help I would recommend to physically reseat line card in the slot in the same MW and see how it will work.
Please let me kno of the results.
Nik
11-17-2011 08:53 PM
Hi Nik
Output of command:
show platform hardware gemini poke 0x5 0x18
Set-register on Module 1, Unit 1 - Ports 1, 2 :
Set register 0x0005: GM_CI_INT_STATUS = 0x00000018 [24] mask 0xFFFFFFFF
Set-register on Module 1, Unit 2 - Ports 3, 4 :
Set register 0x0005: GM_CI_INT_STATUS = 0x00000018 [24] mask 0xFFFFFFFF
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide