Solved: High process memory cisco 6509

samarjitdas · ‎11-14-2011

Can someone tell me which process is consuming high memory with my c6509 switch. File attached for ref.

nkarpysh · ‎11-18-2011

Hi Samarjit,

So smth is wrong with registers - we forced those to crrect values and saw problem gone for a while. Then they failed back to incorrect once. I would try to physically reseat the card and see if it helps. Otherwise it can be HW problem or a new DDTS - then you'll need to open a TAC case to deeg it deeper.

Nik

HTH,
Niko

View solution in original post

nkarpysh · ‎11-14-2011

Hi ,

Can you pls start with "show proc mem" and "show mem sum" plus "show ver" to understand the details on the node. Output attached is quite hard to read without knowing background.

Nik

HTH,
Niko

samarjitdas · ‎11-14-2011

Hi Nik

Provided the output of show version & show process memory in the form of attachment.show mem sum command is not running with existing IOS. Please ask if more information is required for investigation purpose.

nkarpysh · ‎11-15-2011

Thanks,

I see most memory taken by ios-base. That is comulative systme process.

Can you get

show process memory detail ios-base

show process memory detail ios-base taskid (where TASK is number of process holding most memory in previous command other thatn Init or dead)

show memory detailed ios-base dead

to check it.

Nik

HTH,
Niko

samarjitdas · ‎11-15-2011

Hi Nik

As required please find the output in the form of attachment.

nkarpysh · ‎11-15-2011

Ok,

So answering to your initial question:

Can someone tell me which process is consuming high memory with my c6509 switch.

Most of the memory held by ios-base which is main cumulative process. You see what are the sub-processes in it.

sh processes memory detailed ios-base

System Memory : 524288K total, 351957K used, 172331K free, 1000K kernel reserved

Lowest(b) : 176058368

Process sbin/ios-base, type IOS, PID = 16407

163756K total, 87504K text, 4K data, 96K stack, 76152K dynamic

Heap : 80733024 total, 76857816 used, 3875208 free

Task TTY Allocated Freed Holding Getbufs Retbufs TaskName

0 0 79395328 1752 79056968 0 0 *Init*

0 0 1293906040 1232002068 8458308 6781852 0 *Dead*

229 0 1113240 56175112 848392 0 0 FM core

14 0 76093160 75601240 492792 62843372 62843372 Pool Manager

37 0 1585208 595648 453752 0 0 IPC Seat Manage

49 0 3017944 2607832 352312 0 0 rf proxy rp age

326 0 1167632 777256 344048 0 0 RPC pagp_switch

27 0 692144 372488 316008 0 0 Entity MIB API

201 0 5891384 5571680 297824 0 0 ARP HA

39 0 268560 624 268808 120600 0 EEM ED Syslog

344 0 384282544 383994632 254624 0 0 Port manager pe

117 0 440448 245248 176896 0 0 PF_Init Process

4 0 1577055416 1576708056 151024 0 0 Service Task

121 0 1705624 1225664 145248 0 0 CHKPT rcv MSG

221 0 218360 109680 135544 0 0 XDR mcast

404 0 20331912 20075512 125896 0 0 SNMP ConfCopyPr

363 0 243040 55136 117432 0 0 Entity MIB C6k

5 0 988752 918200 115592 0 0 Service Task

If you want to spread it further you can do

show process memory detail ios-base taskid

for each particular process.

Please let me know if you have any further queries.

Nik

HTH,
Niko

samarjitdas · ‎11-15-2011

Hi Nik

Thanks for your thorough analysis. What i can see is that task name listed as "Init" is consuming lots of memory and not releasing memory as compared to other process. So is there any workaround which can reduce memory consumption.

Actually I was troubleshooting a problem with Cisco 6509 switch and found the memory high for this process.Initial problem started by putting automatically port up/down state of a particular module without any reason. Later sup engine(WS-SUP720-3B) started restarting automatically. I reset the SUP engine but problem not resolved. Just now when I logged into the module(WS-X6704-10GE) where intially port oscillating between up and down, I saw CPU utilization of that module is running 100%.I have attached tech-support of the module for your ref.

nkarpysh · ‎11-15-2011

Hi Samarjit,

Init process is responsible for ION initialization and other system processes to tun your OS so that is getting it's memory from begining and usually holding same throughout operation. Also you have enough free memory so that should not be an issue unless that is leaking continuously:

------------------ show process memory detailed ------------------

System Memory : 262144K total, 154525K used, 107619K free, 1000K kernel reserved

Regarding the High CPU - it is traffic driven. I mean some traffic punted to CPU:

------------------ show process cpu detailed ------------------

CPU utilization for five seconds: 100%; one minute: 100%; five minutes: 100%

12307 99.7% 98.9% 98.5% ios-base 3h55m

1 0.1% 0.2% 0.2% 22 Intr 47.972

2 0.6% 0.6% 0.6% 5 Ready 79.937

3 0.3% 0.1% 0.2% 10 Receive 4.068

4 0.0% 0.0% 0.0% 10 Receive 11.435

5 0.0% 0.0% 0.0% 11 Nanosleep 0.632

6 94.9% 96.4% 96.4% 22 Intr 3h48m

Process sbin/ios-base, type IOS, PID = 12307

CPU utilization for five seconds: 0%/99%; one minute: 0%; five minutes: 0%

12307 99.7% 98.9% 98.5% ios-base 3h55m

*SNIP*

6 94.9% 96.4% 96.4% 22 Intr 3h48m

Process sbin/ios-base, type IOS, PID = 12307
CPU utilization for five seconds: 0%/99%; one minute: 0%; five minutes: 0%

/99% - means interrupts which are CPU handling particular packets.

Please check if any of your interfaces have following:

- High broadcast rate

- Input drops

You can do following debug (safe to run in production) to see what packets are punted to CPU:

debug netdr cap rx

show netdr cap

One more thing regarding crashes - those seem to be reset due to lost power:

System returned to ROM by power-on

I would also advise to check if everything is good with power sources and environment in server room - temperature, etc.

Nik

HTH,
Niko

samarjitdas · ‎11-16-2011

Hi Nik

Thanks for your great help. You have provided fantastic analysis on case data as well as provided solid troubleshooting step. I ran debug command suggested by you and caught huge broadcast generated by one server in data center. After filtering those broadcasts at closet to source , CPU utilization of SUP engine fall down but still module WS-X6704-10GE for which I shared tech-support yesterday undergoing high CPU process ( 100% all times).Can you please suggest me some addtional troubleshooting steps for module WS-X6704-10GE.

Only two ports of module WS-X6704-10GE are being used for connectivity between Primary & secondary switches.

nkarpysh · ‎11-16-2011

Hi Samarjit,

Can you please attach following commands from those ports (let's call port GiX/Y)

show int GiX/y ---3 times

show buffer input-int gix/y ---- 3 times

show counter int gixy ---- 2 times

show queueing int gix/y --- 2 times

show int gix/y count err --- 3 times

show int gix/y switching --- 3 times

The CPU on card is also High due to traffic - please check if any ACL configured on these ports e.g. with log option.

Nik

HTH,
Niko

samarjitdas · ‎11-16-2011

Hi Nik

Please find the attached data.Moreover,get to clarify, no ACL is being applied to any port of this module.

nkarpysh · ‎11-16-2011

Thanks Samarjit,

Thos look good. I suspect smth else. Can you please get follwoing logs:

From SUP:

show platform hardware capacity

on the line card:
show stack
sh platform netint
show platform hardware gemini interrupts - few times withen 5 minutes

Nik

HTH,
Niko

samarjitdas · ‎11-16-2011

Hi Nik

Please find the attached files. Show stack command is not working alone, it seems addional keywork required along with. Please let me know incase anything more required.

nkarpysh · ‎11-17-2011

Thanks Samarjit,

I see gemini interrupts which I suspected.

E.G.

Interrupt stats on Module 1, Unit 1 - Ports 1, 2 :
Int=tc_int              , num=0        Int=ed_int              , num=2173331
Int=er_int              , num=2173331 Int=nf_int              , num=0

There is one DDTS for CFC line cards when these interrupts appear erroneously cause High CPU, so here on DFC may be smth similar or same Gemini register is corrupted.

You can try following workaround sugested for CFC cards:

enter the following commands in enable mode:

remote login module

show platform hardware gemini poke 0x5 0x18

Please do it on Maintenance Window - DE advise that it is harmless but as I said that was for CFC so additional caution should be taken in our case. If did not help I would recommend to physically reseat line card in the slot in the same MW and see how it will work.

Please let me kno of the results.

Nik

HTH,
Niko

samarjitdas · ‎11-17-2011