Solved: QoS causing router CPU utilization to rise

Angelo ANELLO · ‎10-17-2012

Hi Guys,

We have recently implemented some QoS on our routers and i have noticed that the CPU usage has risen enormously as a result. Can someone advise if this is normal?

I checked the CPU Processes and found that the avg values are 50% usage. This seems rather high considering that only 75% of our edge routers have this feature enabled.

Our main router is a Cisco CISCO2911/K9 (revision 1.0) with 487424K/36864K bytes of memory. Our edge routers are either 2801 models or 881 models.

This is our first implementation of QoS and would like to ensure that it is working properly before implementing other QoS rules. Here is a copy of the config on the 2911 router:

class-map match-all Citrix

match protocol citrix

class-map match-all Print

match protocol printer

class-map match-all C-Coolingata

match access-group name Coolingata

class-map match-all C-Hawthorn

match access-group name Hawthorn

class-map match-all C-MonaVale

match access-group name MonaVale

!

policy-map Shape2M

class Citrix

priority percent 80

set dscp af41

class Print

bandwidth remaining percent 20

set dscp af11

random-detect

class class-default

bandwidth remaining percent 80

random-detect

policy-map Global

class C-Hawthorn

shape average 2048000

service-policy Shape2M

class C-MonaVale

shape average 2048000

service-policy Shape2M

class C-Coolingata

shape average 2048000

service-policy Shape2M

policy-map Shape4M

class Citrix

priority percent 80

set dscp af41

class Print

bandwidth remaining percent 20

set dscp af11

random-detect

class class-default

bandwidth remaining percent 80

random-detect

policy-map Shape1M

class Citrix

priority percent 80

set dscp af41

class Print

bandwidth remaining percent 20

set dscp af11

random-detect

class class-default

bandwidth remaining percent 80

random-detect

ip access-list extended Coolingata

permit ip any 192.168.65.0 0.0.0.255

ip access-list extended Hawthorn

permit ip any 192.168.76.0 0.0.0.255

ip access-list extended MonaVale

permit ip any 192.168.34.0 0.0.0.255

int LAN0

ip nbar protocol-discovery

int WAN

service-policy output Global

Here is the config on our edge routers:

class-map match-any Citrix

match access-group name Citrix-ACL

policy-map WAN

class Citrix

priority percent 80

set dscp af41

class class-default

bandwidth remaining percent 100

random detect OR fair queue (dependant on router type)

policy-map Global

class class-default

shape average 20480000 (number equal to the link speed)

service-policy WAN

ip access-list extended Citrix-ACL

permit tcp any 192.168.1.0 0.0.0.255 eq 1494

permit tcp any 192.168.1.0 0.0.0.255 eq 1604

int WAN

service-policy output Global

As you can see we have different link speeds at different sites based on their size, location etc.

Can anyone see any issues or be able to confirm that this config is correct?

Your help is appreciated

Regards,

Joseph W. Doherty · ‎10-17-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Disable QoS features not needed. For instance, do you need NBAR protocol discovery? Do you really need WRED?

Sequence match lists. Do the class-map sites correspond to frequency of matches? Does the Citrix ACEs correspond to frequency of matches?

As some protocol (NBAR) matching looks deeper into a packet beyond traffic type and port numbers, can you use an ACL. For example, instead matching protocol Citrix, could you use an ACL?

Is CEF enabled?

Is Netflow caching enabled?

View solution in original post

pierrescotland · ‎10-18-2012

Your QoS config would appear to be similar to configurations I have worked on and I can confirm that CPU load does seem to increase slightly, especially matching with NBAR. I would be interested to hear from others on this.

By the way, for you Citrix traffic I notice your're matching TCP 1494, if you're using Session Reliability (and by default since version 4 I think this is normally the case) you'll also have to match port 2598.

cheers

View solution in original post

Joseph W. Doherty · ‎10-17-2012

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Disable QoS features not needed. For instance, do you need NBAR protocol discovery? Do you really need WRED?

Sequence match lists. Do the class-map sites correspond to frequency of matches? Does the Citrix ACEs correspond to frequency of matches?

As some protocol (NBAR) matching looks deeper into a packet beyond traffic type and port numbers, can you use an ACL. For example, instead matching protocol Citrix, could you use an ACL?

Is CEF enabled?

Is Netflow caching enabled?

pierrescotland · ‎10-18-2012

Your QoS config would appear to be similar to configurations I have worked on and I can confirm that CPU load does seem to increase slightly, especially matching with NBAR. I would be interested to hear from others on this.

By the way, for you Citrix traffic I notice your're matching TCP 1494, if you're using Session Reliability (and by default since version 4 I think this is normally the case) you'll also have to match port 2598.

cheers

Angelo ANELLO · ‎10-22-2012

Thank you both for your replies. I will test some modifications to the config and post back the results.

Regards,

Fedor Travinsky · ‎10-27-2015

Joseph,

some time passed since you answered))

I'm trying to handle a 10+ Mbps flow with an old 2811 hardware. With polices enabled CPU is choking.

QoS config is generic: WRED, classes with NBAR, incoming classification & marking and outgoing shaping on both interfaces (I shape ingress ISP traffic by bottlenecking opposite interface).

I got your recommendation to avoid NBAR (use ACL when possible).

Got NetFlow export enabled. Is that what you meant by caching? How it affects CPU?

What else mechanism instead WRED?

How to align classes for best CPU performance?

I listened the CI-QoS course two years ago and that time didn't have an idea I'll face CPU performance issues.

Thanks.

Joseph W. Doherty · ‎10-28-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Hmm, about 6 to 10 years ago, I worked with quite a few 2811s, with a QoS policy, and they seemed to top out at about (aggregate) 40 Mbps (for our traffic), so I would expect it could support 10 Mbps, w/o choking. Of course, with router configs, "your mileage might vary".

BTW, shaping an opposite interface, to try to control ingress bandwidth, is almost totally ineffective. Policing, though, usually is more effective, but even it, has many limitations for that usage.

My recommendation against NBAR depends much on how you're using it. Sometimes it's actually the same as an ACL but with a pretty face. Sometimes, though, it digs much deeper into the packet. If you really need it, you need it, but on a software based router, deeper inspection requires more work from the CPU.

NetFlow is a caching mechanism, and it does add to the CPU load.

What else rather than WRED, that depends what you're trying to accomplish. IMO there are lots of issues when using WRED, so many, I advise unless you're a QoS expert, you avoid using it. It, too, will add to CPU loading.

You align CBWFQ classes, much like you align an ACL, i.e. most matches first. (This assumes there's no sequence dependency - again like an ACL sequence.)

Is your high CPU interrrupt or process?

Fedor Travinsky · ‎10-28-2015

Thanks for the reply : )

I find such hierarcy with 80-90%of bandwidth quite effective on opposite interface to supress incoming traffic:

policy-map QoS
class class-default
shape average percent 90
service-policy CBWFQ <-everything else inside

LAN(config-if)# bandwidth ISP_SLA

LAN(config-if)# service-policy output QoS

It narrows the bandwidth a bit, but gives smooth user expreience. It mostly works ahead of ISP's policer. Yet I got trouble with not-shaping traffic running between sub-interfaces on LAN side - but that's another story.

I guess I should not use NBAR in output (shape) policy? Only DSCP values and ACL?

#sh proc cpu sorted shows no consuming processes:

IP Input, HQF Shaper Backg and Per-Second Jobs are around 1,5%.

Joseph W. Doherty · ‎10-28-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

I understand what you're doing, and why you're doing it, I'm just somewhat surprised you find it as effective as you note.

Some sending protocols, such as TCP, will slow their transmission rate when they detect dropped packets. Some of the latest TCP stacks will also slow their transmission rate when they see a jump in RTT. But if you're only slowing 80 to 90 percent, the sender won't see much difference as the upstream 100% bottleneck should have already dropped or queued.

You didn't post your IOS version or subordinate child policy, but pre-HQF shapers did FQ and your subordinate child policy might also have FQ, either often "smooth" out competing flows, but again, I'm a bit surprised you would see much of a difference with only a 10 to 20% delta.

But, if you say it works better, I believe you.

Anyway, nothing wrong with NBAR, but just remember deep inspection likely requires additional CPU, so it's more of a question what do you really need to examine. For example, the reason for the QoS tag is so that each hop only need look at it rather than applying a bunch of rules to determine QoS treatment.

Ok, your CPU is generally all interrupt, which is good, but how high and how long is CPU usage?

Fedor Travinsky · ‎10-29-2015

Right now i'm doing my best to optimize classes. Changes I made:

<...>

I trust ISP's and my own DSCP tags from tunnels (qos-preclassify)

class-map match-any Marked <- the first class for inbound policy
match not dscp default

class-map match-any Mark_Call_Signalling <- that's for inbound marking
  !match dscp cs3 <- I mistakenly used one class both for mark and shape
match protocol sip
  match protocol rtcp
  match protocol h323

class Mark_Call_Signalling
set dscp cs3

class-map match-any Call_Signalling <- that's for outgoing shaping policy
match dscp cs3

<...>

Next step will be replacing NBAR with ACLs on traffic-heavy classes.

Any idea how to move upper a class in policy-map without using "no policy-map"?

Regarding traffic incoming from WAN (=outgoing to LAN):

sender won't see much difference as the upstream 100% bottleneck should have already dropped or queued.

I handle WAN traffic from Internet. I guess there's no bottleneck on sender's side as it is actually a BitTorrent cloud or a fast web-server in datacenter.

My idea with bottleneck is to decide what to start supressing when incoming traffic reaches 90%, before ISP's policer does the same randomly at 100% of SLA bandwidth : )

Joseph W. Doherty · ‎10-29-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

"I trust ISP's and my own DSCP tags from tunnels (qos-preclassify)"

Tunnels and QoS-preclassify? QoS-preclassify allows the device with the tunnel interface to have policy that examines a copy of the pre-tunnel IP header on the same device's physical interface. As the ToS byte is generally copied from the pre-tunnel packet, you don't need QoS-preclassify to examine it and if you're on the device with the tunnel interface, you can often do QoS packet examination at the tunnel interface often negating the need for QoS-preclassify. If you're on the receiving end, unclear how QoS-preclassify is germane.

class-map match-any Mark_Call_Signalling <- that's for inbound marking
match protocol sip
match protocol rtcp
match protocol h323

BTW, for efficiency (on software based devices), besides being careful how class-maps are sequenced within a policy-map, I believe there's slight benefit to also taking care how individual match clauses are sequenced.

"Next step will be replacing NBAR with ACLs on traffic-heavy classes."

Just remember NBAR can match things not possible with an ACL. The question is, is the additional match analysis needed. i.e. Don't just replace NBAR with an ACL without careful consideration what both are actually examining vs. what you need to examine.

"My idea with bottleneck is to decide what to start supressing when incoming traffic reaches 90%, before ISP's policer does the same randomly at 100% of SLA bandwidth : )"

Yes, again, I know exactly what and why you're trying. Also again, my experience it doesn't work real well (that's not to say it doesn't work at all), and a shaper, since it queues, will often lag what can be done with a policer. There are several issues that make the technique less effective then we would like.

First, ingress can burst and saturate upstream before the downstream "sees" the congestion. Second, not all senders will react (slow) to just congestion, and some will not react (slow) to drops (much depends on the protocol). Lastly, the feedback loop can be slow, epecially slow for protocol scale up transmission rate (e.g. TCP).

My experience using the this technique, I've sometimes found I have to really bottleneck the bandwidth to avoid upstream bandwidth saturation, to insure there's available upstream bandwidth left for demanding traffic like VoIP. Of course, if your traffic doesn't have such demanding QoS requirements, you don't need to try to guarantee as much upstream bandwidth.

If you're running tunnels across an Internet connection, I've had great results by shaping the tunnel's sender's side, and by NOT sharing such an Interconnection with generic Internet traffic, i.e. have different Internet connections for tunnel traffic vs. generic Internet traffic.

PS:

If you're willing to try such advanced QoS technique as managing upstream bandwidth, downstream, you might be interested to know that I've also tried shaping outbound ACKs. With TCP, this too can help control a TCP sender's transmission rate, but I also found, a router cannot do it with enough granularity.

Fedor Travinsky · ‎10-29-2015

Thanks again for sharing your deep knowledge : )

Is there a way to change sequence of classes without killing the whole policy-map and starting from scratch?

Adding a new class sends it to bottom : ( Killing all classes requires to specify class names.

Killing a policy-map in use is a bad way.

I bet I need to reapply (using no command) modified service-policy to interface?

(I've seen shaping going mad after realtime modifications till reload).

Joseph W. Doherty · ‎10-29-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

You don't have to drop the whole policy, but you do need to drop and reenter any class that should follow the class you're inserting.

For example, given:

policy-map example

class 10

class 20

class 30

class class-default

If you wanted to insert a class 5, you would need to remove all classes, but class-default, and enter classes 5, 10, 20 and 30. But if you wanted to insert class 25, you would only need to remove class 30, and then enter classes 25 and 30.

Fedor Travinsky · ‎10-29-2015

I've done with tuning.

Didn't replace NBAR, just arranged classes and removed duplicate NBAR from shaping (now egress classes are with "match DSCP" only). Nothing else. Now applying 240 lines of revised code (60 "no"s) to branch offices..

Platform 2821, 20 Mpbs link.

Test 1: Torrent incoming without any speed limits.

Old QoS: 99%, 1.6 Mb/sec DL, choking.

Without QoS: ~30%, 2.2 Mb/sec DL. guess the ping and ISP's drop rate

Tuned QoS: 60%, 1.6 Mb/sec UDP, 1600 pps

class-map match-any Mark_Scavenger (part of WAN input policy)
match protocol bittorrent

class Mark_Scavenger
set dscp cs1

class-map match-any Scavenger (part of LAN output policy)
match dscp cs1

class Scavenger
bandwidth remaining percent 1
random-detect dscp-based

Scavenger Class drop rate is around 30 pps / 300 kbps with 15 Mbps passing through.

Other classes without drops. Hope, ISP's policer didn't kill some packets.

Test 2: HTTP download (from nvidia.com).

class-map match-any Mark_Bulk_Data <- I respect HTTP and HTTPS : )
match protocol secure-http
match protocol http mime "image/*"
match protocol http mime "text/html"

class Mark_Bulk_Data
set dscp af11

class class-default (LAN output policy) <- everything else (including this HTTP download)
bandwidth remaining percent 5
random-detect dscp-based
random-detect ecn <- I enabled ECN on my Windows infrastructure.

Tuned QoS: 55%, 1.9 Mb/sec TCP, 1500 pps (cs0, 5 pps / 50 kbps drop rate).

In both tests 140-bytes ping raises from 2 ms to <10 ms (ICMP has cs2). So I state my bottleneck for ingress traffic as a good solution.

Thanks for fixing my brains and support!

BTW, Paessler PRTG Monitor greatly displays Class-Based QoS SNMP values and drop rates per class&interface:

Fedor Travinsky · ‎10-29-2015

Any idea how to match HTTPS traffic from *.googlevideo.com (YouTube)? (Answer is No, I'm just curious)

Does it make sense to use "random-detect dscp-based" in each class (which is tolerate to drops), if each class has only one type of dscp traffic?

Is it normal for 2821 to have 25%/19%interrupt CPU at idle (1 Mbps, 250 pps)?

CPU utilization for five seconds: 26%/19%; one minute: 25%; five minutes: 25%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
138 181379580 75577202 2399 1.43% 1.87% 1.67% 0 IP Input
368 35831232 1411829227 25 1.19% 1.16% 1.14% 0 IP SLAs XOS Even
293 37287688 24909675 1496 0.79% 0.46% 0.45% 0 Crypto IKMP
369 39180680 25505069 1536 0.71% 0.50% 0.48% 0 IP SNMP

Joseph W. Doherty · ‎10-29-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Since the traffic is encrypted, perhaps the best you can do is recognize it's HTTPS and from a google video IP.

25% seems high for 1 Mbps. What IOS are you using?

PS:

Regarding WRED, I recommend in most instances to stay away from it.