07-27-2011 08:14 PM - edited 03-07-2019 01:26 AM
Hi there...
I'd like to take a step forward and introduce my problem, looking for a professional support
first of I'm an IT engineer working in a small operator telecommunication company, a few days ago we implemented a new technique of load balancing
between 4 different ISPs, the load balancing part of our network is consist of 4 routers serves as a gateway to each on of the 4 ISPs and a core switch 6506E with sup-720 engine load balancing between the different ISPs with a total bandwidth of 300Mbps to provide an internet services to approximatly 4000 online users daily, during the live network the CPU utilization of the 6506E reached 99%
The network implementation is as simple as this,
The 6506 is running EIGRP protocol with the 4 ISP routers and a floating NAT is implemented based on a route-maps matching the next-hop addresses
And each one of the 4 ISP routers has a default gateway + a tracking object redistributed into EIGRP to let the 6506 load balancing between the redistributed default routes of a different metrics
To make this challange more realistic i've attached the network diagram to this discussion along with the configuration file of the Cat 6506E
Any ideas why the CPU usage is 99%
Appreciation in advance...
"Many words will not fill a bushel..."
07-28-2011 04:22 AM
Hi there,
There is alot of troubleshooting to do.
Run through this website if you havnt already
http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml
07-28-2011 05:34 AM
I do not quite understand the use of matching ip next-hop in the route maps:
route-map next-FIBER2 permit 10
match ip address USERS
match ip next-hop next-FIBER2
Routing is performed before NAT ( http://www.cisco.com/en/US/partner/tech/tk648/tk361/technologies_tech_note09186a0080133ddd.shtml ) so CEF already knows the next hop. There is no need to check it again in a route map. Besides, the route map matches any line and the first one already matches for every packet.
In the command reference it says that match ip next-hop is used: To redistribute any routes that have a next hop router address passed by one of the access lists specified, use the match ip next-hop command in route-map configuration mode.
http://www.cisco.com/en/US/partner/docs/ios/12_3/iproute/command/reference/ip2_k1g.html#wp1038187
Nothing there about using this command for NAT purposes. As I see it, this should also work without this line in every route map. Not sure that it reduces your cpu.
Other remarks:
regards,
Leo
07-28-2011 11:16 AM
run "sh processes cpu sorted"
this will tell you exactly what is using up the cpu.
agree with Leo about the next hop matches ... not needed. The Nat-ing gets done on the outgoing interface.
i'd also recommend moving your NAT onto the 4 routers ...
07-28-2011 01:30 PM
Guys,
I'm load balancing between the four routes so i need a four NAT statements each one will translate to one of the four
segments of the four routes after the routing decision is made, and each NAT statment will use a route-map that matchs the next-hop which is in this case is the IP address of one of the four routers
If the hashing algorithm chooses the first route then the NAT statment that matches the next-hop of that route will work and so on, the same thing happen to the other routes... how can you NAT-ing to a different routes without matching the next
hop IP address
As for the bandwidth command, i set it to be in Kbps for simplicity and to make the traffic count ratio more accurate, i dont believe its the reason behind the high CPU usage
Also, I've already executed the command "show proc cpu sor" before and the only service that has a high cpu utilization
was the NAT which was 20% and some few services was fluctuating between 0 to 1%
07-29-2011 12:23 AM
Hi,
I would suggest you to control the NAT translations by configuring the below commands on global config mode.
This will help you in reducing the burden on the router CPU.
ip nat translation tcp-timeout 600
ip nat translation udp-timeout 600
Please rate the helpfull posts.
Regards,
Naidu.
07-29-2011 06:24 AM
I think the default timeout for TCP translation somewhere around 20 hours, so if i tweaked the timers to be 10 minutes this will definitely cause some applications to fail, especially when someone trying to watch an online video of more than 10 minutes duration through websites like youtube, Plus the translations timeout doesn't impact the CPU but it consumes the MEMORY, it caches the translations in the memory
The NATing "process" of each packet that egress/ingress the outside interface is the one that impact the CPU so If i wanna reduce the burden of NAT on the CPU i would then limit the maximum translation entries but this is a very dangerous thing to do since I have 4000 online users daily.
Dude, why are we discussing the NAT it is only reserving 20% of the CPU, the problem is the 79% that i don't know yet where it is coming from
Seriously, I started to doubt that the sup-720 engine can handle the 40,000 sessions and 300Mbps
and one more thing, do you think if i add a module and use the interfaces instead of the subinterfaces would affect the CPU utilization...
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: