cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1240
Views
0
Helpful
6
Replies

High CPU utilization. A real productive environment, please review...

Muhanad.Ali
Level 1
Level 1

Hi there...

I'd like to take a step forward and introduce my problem, looking for a professional support

first of I'm an IT engineer working in a small operator telecommunication company, a few days ago we implemented a new technique of load balancing

between 4 different ISPs, the load balancing part of our network is consist of 4 routers serves as a gateway to each on of the 4 ISPs and a core switch 6506E with sup-720 engine load balancing between the different ISPs with a total bandwidth of 300Mbps to provide an internet services to approximatly 4000 online users daily, during the live network the CPU utilization of the 6506E reached 99%

The network implementation is as simple as this,

The 6506 is running EIGRP protocol with the 4 ISP routers and a floating NAT is implemented based on a route-maps matching the next-hop addresses

And each one of the 4 ISP routers has a default gateway + a tracking object redistributed into EIGRP to let the 6506 load balancing between the redistributed default routes of a different metrics

To make this challange more realistic i've attached the network diagram to this discussion along with the configuration file of the Cat 6506E

Any ideas why the CPU usage is 99%

Appreciation in advance...

"Many words will not fill a bushel..."

6 Replies 6

simon.dwyer
Level 1
Level 1

Hi there,

There is alot of troubleshooting to do. 

Run through this website if you havnt already

http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml

lgijssel
Level 9
Level 9

I do not quite understand the use of matching ip next-hop in the route maps:

route-map next-FIBER2 permit 10

match ip address USERS

match ip next-hop next-FIBER2

Routing is performed before NAT ( http://www.cisco.com/en/US/partner/tech/tk648/tk361/technologies_tech_note09186a0080133ddd.shtml ) so CEF already knows the next hop. There is no need to check it again in a route map. Besides, the route map matches any line and the first one already matches for every packet.

In the command reference it says that match ip next-hop is used: To redistribute any routes that have a next hop router address passed by one of the access lists specified, use the match ip next-hop command in route-map configuration mode.

http://www.cisco.com/en/US/partner/docs/ios/12_3/iproute/command/reference/ip2_k1g.html#wp1038187

Nothing there about using this command for NAT purposes. As I see it, this should also work without this line in every route map. Not sure that it reduces your cpu.

Other remarks:

  • Your bandwidth commands need to be mulltiplied by 1000 (bw is expressed in kb/s)
  • Try what happens when you swap the NAT pool for a single ip address (subif interface ip). I know this may lead to NAT exhaustion but the nat pool is another potential source of high cpu.
  • Otherwise, use the troubleshooting tools as hinted by the previous poster to find the responsible process.

regards,

Leo

nqtran1979
Level 1
Level 1

run "sh processes cpu sorted"

this will tell you exactly what is using up the cpu.

agree with Leo about the next hop matches ... not needed. The Nat-ing gets done on the outgoing interface.

i'd also recommend moving your NAT onto the 4 routers ...

Guys,

I'm load balancing between the four routes so i need a four NAT statements each one will translate to one of the four

segments of the four routes after the routing decision is made, and each NAT statment will use a route-map that matchs the next-hop which is in this case is the IP address of one of the four routers

If the hashing algorithm chooses the first route then the NAT statment that matches the next-hop of that route will work and so on, the same thing happen to the other routes... how can you NAT-ing to a different routes without matching the next

hop IP address

As for the bandwidth command, i set it to be in Kbps for simplicity and to make the traffic count ratio more accurate, i dont believe its the reason behind the high CPU usage

Also, I've already executed the command "show proc cpu sor" before and the only service that has a high cpu utilization

was the NAT which was 20% and some few services was fluctuating between 0 to 1%

Hi,

I would suggest you to control the NAT translations by configuring the below commands on global config mode.

This will help you in reducing the burden on the router CPU.

ip nat translation tcp-timeout 600

ip nat translation udp-timeout 600

Please rate the helpfull posts.

Regards,

Naidu.

I think the default timeout for TCP translation somewhere around 20 hours, so if i tweaked the timers to be 10 minutes this will definitely cause some applications to fail, especially when someone trying to watch an online video of more than 10 minutes duration through websites like youtube, Plus the translations timeout doesn't impact the CPU but it consumes the MEMORY, it caches the translations in the memory

The NATing "process" of each packet that egress/ingress the outside interface is the one that impact the CPU so If i wanna reduce the burden of NAT on the CPU i would then limit the maximum translation entries but this is a very dangerous thing to do since I have 4000 online users daily.

Dude, why are we discussing the NAT it is only reserving 20% of the CPU, the problem is the 79% that i don't know yet where it is coming from

Seriously, I started to doubt that the sup-720 engine can handle the 40,000 sessions and 300Mbps

and one more thing, do you think if i add a module and use the interfaces instead of the subinterfaces would affect the CPU utilization...

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card