02-24-2016 10:31 PM - edited 03-05-2019 03:25 AM
Hi,
I have a 6509-E chassis deployed as WAN termination router, only having the following links connected to it :-
Now I am getting a high cpu(touching 100%) in working hours 10 A.M. to 6 P.M. following are the outputs from the switch :
#sh mod
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
1 24 CEF720 24 port 1000mb SFP WS-X6824-SFP SAL1816QMG9
2 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6848-GE-TX SAL1922FUFC
5 5 Supervisor Engine 2T 10GE w/ CTS (Acti VS-SUP2T-10G SAL1817R0YH
6 5 Supervisor Engine 2T 10GE w/ CTS (Hot) VS-SUP2T-10G SAL1815PZPK
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 18e7.2820.7918 to 18e7.2820.792f 1.0 12.2(18r)S1 15.1(1)SY2 Ok
2 64f6.9df1.a950 to 64f6.9df1.a97f 1.4 12.2(18r)S1 15.1(1)SY2 Ok
5 6c41.6a0c.1fa2 to 6c41.6a0c.1fa9 1.7 12.2(50r)SYS 15.1(1)SY2 Ok
6 503d.e513.df71 to 503d.e513.df78 1.7 12.2(50r)SYS 15.1(1)SY2 Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6K-DFC4-A SAL1815QD6L 2.0 Ok
2 Distributed Forwarding Card WS-F6K-DFC4-A SAL1922FUFC 1.4 Ok
5 Policy Feature Card 4 VS-F6K-PFC4 SAL1815QA90 2.1 Ok
5 CPU Daughterboard VS-F6K-MSFC5 SAL1816QYZV 2.1 Ok
6 Policy Feature Card 4 VS-F6K-PFC4 SAL1816QJE0 2.1 Ok
6 CPU Daughterboard VS-F6K-MSFC5 SAL1814PNPC 2.1 Ok
#sh proc cpu sort 5s | ex 0.00
CPU utilization for five seconds: 92%/56%; one minute: 69%; five minutes: 72%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
442 12748176 4456882 2860 33.43% 4.84% 3.27% 0 IP NAT Ager
483 1917980 6322018 303 0.47% 0.29% 0.30% 0 XDR receive
783 29055636 10320646 2815 0.39% 0.11% 0.13% 0 NF SE Intr Task
481 145972 18540449 7 0.39% 0.31% 0.34% 0 XDR mcast
189 85928236 167659848 512 0.31% 0.85% 0.93% 0 slcp process
680 10455672 2555992 4090 0.23% 0.37% 0.39% 0 Env Poll
713 445108 6783233 65 0.15% 0.03% 0.01% 0 Port manager per
788 1050192 547783 1917 0.07% 0.05% 0.06% 0 OBFL INTR obfl0
438 804628 12303857 65 0.07% 0.04% 0.05% 0 IP Input
78 8163632 1000189 8162 0.07% 0.13% 0.13% 0 SEA write CF pro
832 346904 786932 440 0.07% 0.06% 0.07% 0 FNF Cache Ager P
353 407600 47383871 8 0.07% 0.05% 0.06% 0 EARL Intr Thrtl
T#sh proc cpu his
6666666666777777777777777888886666666666666666666666666666
6333322222000003333300000000003333388888999996666699999000
100
90
80 *****
70 * ******************** ********************
60 **********************************************************
50 **********************************************************
40 **********************************************************
30 **********************************************************
20 **********************************************************
10 **********************************************************
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)
1
9789797788977889899999999999999899988899999998999088999879
2826588481865069229999599905988609969799729964799042999186
100 * * * * *#*#**** **** ** *** *** **** *** *
90 * * * * * ** **#####***#******#****####** #**# *** *
80 **##*** ******#*##############*###################*#####**
70 ##########################################################
60 ##########################################################
50 ##########################################################
40 ##########################################################
30 ##########################################################
20 ##########################################################
10 ##########################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
1 1 1111111 1 11 1
0431122122211233699998990331222121122439890000000523321223422335090090
0687802818879641899427990649462848615405590000000445088500067285090090
100 * ** *** * ******** ******
90 * ******** ********** ******
80 * ******#* ***#***##* *#****
70 # **#***##* **#######* *###*#
60 # *######## **#######* **#####
50 #* *######## **#######** *######
40 #** *########* * **########* * * **######
30 #** ** ***#########** * ****#########* *** * *******######
20 ##**************##########**************##########*************#######
10 ###*****##*################********###################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%
Kindly let me know if I can do something to lower the cpu usage especially interrupt percent.
Regards,
Bhushit
02-25-2016 12:23 AM
Hello Bhushit,
>> CPU utilization for five seconds: 92%/56%; one minute: 69%; five minutes: 72%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
442 12748176 4456882 2860 33.43% 4.84% 3.27% 0 IP NAT Ager
The high cpu usage by interrupts is the signal that many traffic flows are process switched in your C6509 instead of being processed by CEF.
Refer to the following documents to review your current configuration:
Best practices for IOS C6500
http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/24330-185.html
see for high cpu troubleshooting
https://supportforums.cisco.com/document/59926/troubleshooting-high-cpu-6500-sup720
( I know you have sup2T but it should be valid starting point)
I see that the NAT Ager process is taking 30% of 5seconds CPU so I wonder if NAT is performed in software in your system.
Hope to help
Giuseppe
02-25-2016 12:57 AM
Thanks Giuseppe,
All of my traffic is L3 traffic that's why this high interrupt percentage.
Also how can I make sure that NAT is performed all in hardware only ?
Regards,
02-25-2016 01:19 AM
Hello Bhushut17,
>> All of my traffic is L3 traffic that's why this high interrupt percentage.
This is not true, the C6500 is a multilayer switch so also L3 routed traffic shoud be performed in hardware.
You need to investigate the reasons for so many traffic flows punted to the main cpu using the methods explained in the second link I have provided in my first post on this thread.
I'm afraid that NAT is performed in software in this moment for some reason to be found, from the current configuration or other reasons.
Hope to help
Giuseppe
02-25-2016 09:26 PM
From what I came to understand, I got the following observations investigating my config and chassis :
l2idb NULL, l3idb Gi2/1, routine inband_process_rx_packet, timestamp 15:11:45.716
dbus info: src_vlan 0x3F4(1012), src_indx 0x40(64), len 0x4C(76)
bpdu 0, index_dir 0, flood 0, dont_lrn 0, dest_indx 0x7FA3(32675)
cap1 0, cap2 0
C8020900 03F48400 00400000 4C000000 1E000414 22000004 00000000 7FA31383
destmac 18.9C.5D.5E.C8.C0, srcmac 00.03.B2.56.94.D3, shim ethertype CCF0
earl 8 shim header IS present:
version 0, control 64(0x40), lif 16420(0x4024), mark_enable 1,
feature_index 0, group_id 0(0x0), acos 0(0x0),
ttl 14, dti 4, dti_value 0(0x0)
ethertype 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 40, identifier 2450
df 1, mf 0, fo 0, ttl 126, src 10.10.3.93, dst 103.234.162.1
tcp src 62813, dst 22, seq 2472090721, ack 284071500, win 63552 off 5 checksum 0x53D8 ack
Thanks for help !!
02-26-2016 12:33 AM
Hello Bhushit17,
netflow packets are generated locally so I would expect them to be process switched.
You have netflow enabled on the links to the ISP?
You should check also netflow activity, but probably Leo is right in his post and that NAT ager process so high can be a sign that you need an IOS upgrade.
A risk with netflow on the C6500 platform is to run out of space on the netflow cache, but you have SUP2T.
However, from the show module I see that you have WS-F6K-DFC4-A and VS-F6K-PFC4. That is the basic module.
With Sup720 the only good components for connectivity to the internet with BGP full tables were the XL versions (BXL and later CXL) for their extended memory able to allocate CEF entries for all the IP prefixes.
http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-6500-virtual-switching-system-1440/product_data_sheet0900aecd806ed759.html
see also
http://www.cisco.com/c/dam/en/us/products/collateral/switches/catalyst-6500-series-switches/C45_652087_00_catalyst_aag.pdf
It is strange I cannot find a public datasheet for sup2T.
However, from the second link I see a limit of 256K IPv4 routes for SUP2T and PFC4, instead SUP2T XL and PFC4 XL can support 1M IPv4 routes.
If you are receiving full BGP table from your upstream provider you have run out of memory for CEF with your current components and some traffic is process switched for this reason.
A full BGP table is in the order of 512000 routes nowday.
The exact number depends on the part of the world where you are, but this is the order.
Hope to help
Giuseppe
Edit:
I see you have only one upstream provider connected with a 3GE bundle.However, if it sending to you the full BGP table you are in trouble as I have explained above.
02-26-2016 12:52 AM
Yes I have netflow enabled on link to ISP
I am only receiving a single route (default route) from isp through bgp so that isnt a problem, also I haven't fully utilized the cef route cache.
High utilization seems to be due to netflow traffic and nat issue due to IOS, also I have an access list to match traffic to be natted on that same interface I have enabled netflow also this can led to packets being process switched.
Can route map instead of acl help me here.
02-26-2016 01:31 AM
Hello Bhushit17,
OK if you are receiving only a default route from ISP you are fine with your HW components.
Netflow traffic is great because of the variety of traffic flows to and from the internet.
If the netflow cache is full the device tries to send more netflow accounting packets.
You should have 512K entries for the NFC cache.
>> also I have an access list to match traffic to be natted on that same interface I have enabled netflow
For NAT usually you use the ACL in a configuration global statement like
ip nat source inside list 50 interface po1 overload
where po1 is your port-channel to ISP and ACL 50 specify what addresses should be NAT translated.
However, again this type of configuration does not lead to process switching unless you use a log option in the ACL statements.
A route-map for NAT plays the same role of the ACL, but provides added flexibility in the match conditions, so that you can use for example other match criteria rather then only ACLs.
Hope to help
Giuseppe
03-02-2016 08:47 PM
Hi,
If I remove netflow from my WAN interface, switch utilization goes down to 10-15 %.
Without changing any other parameter. Most resource heavy service now is "slcp process".
Regards,
02-25-2016 02:17 AM
Hey Giuseppe,
How are you doing, mate? Haven't seen you (and Paolo) for a long time. Where have you two been hiding out?
02-26-2016 12:13 AM
Hello Leo,
I have been away from the forums for a long time indeed. I cannot answer for Paolo.
I needed a break so I took it.
In any case you and the other guys are doing a nice job in the forums and again compliments for having entered the Hall of Fame. Your effort in the forums is very high and this is well deserved.
You add human touch to the conversations and this is something that is important too.
Best Regards
Giuseppe
02-25-2016 02:17 AM
442 12748176 4456882 2860 33.43% 4.84% 3.27% 0 IP NAT Ager
NAT Ager on a Sup2T. Hmmmm ... Sounds very familiar.
Please try upgrading the IOS. I've had similar issues which only resolved after an IOS upgrade.
02-25-2016 09:22 PM
ohh ! you had encountered the same issue, which IOS got the issue resolved.
Thanks,
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide