Cisco ASR920 traffic shaping and policing fair-queue

BasV · ‎08-17-2021

Hello,

We have a Cisco ASR920 router which is connected to our ISP 500Mbps (both up and down) connection on port GigabitEthernet0/0/11.

On our LAN side we have multiple customers, all with their own IP. We currently set up some basic traffic shaping and policing with the intention that one customer can never take up the full 500Mbps and cause packet loss or connection issues for other customers.

We currently set it up like this:

policy-map shaping-isp
 class class-default
  shape average 524288000   
policy-map policing-isp
 class class-default
  police 524288000 conform-action transmit  exceed-action drop 

interface GigabitEthernet0/0/11
 description ISP
 ip address ***.***.***.*** ***.***.***.***
 media-type rj45
 negotiation auto
 service-policy input policing-isp
 service-policy output shaping-isp
!

However, now I do see regular packet loss, even if the internet usage is between 50 and 100 Mbps.

I also tried to add "fair-queue" to the class class-default on both policy-maps, it however doesn't recognise the command.

Please advise me in how I should set this up properly, and what my issue can be.

Regards,

Bas

Georg Pauwen · ‎08-17-2021

Hello,

I am not really clear on your policy, as you say you have different customers. How much bandwidth is each customer supposed to get ?

Either way, in the example below, each customer would get 100MB, and if the remaining bandwidth is not being used by other customers, the full 500MB can be used:

ip access-list extended CUSTOMER_1_ACL
permit ip 192.168.1.0 0.0.0.255 any
!
ip access-list extended CUSTOMER_2_ACL
permit ip 192.168.2.0 0.0.0.255 any
!
class-map CUSTOMER_1_CM
match access-group name CUSTOMER_1_ACL
!
class-map CUSTOMER_2_CM
match access-group name CUSTOMER_2_ACL
!
policy-map CHILD_SHAPER
class CUSTOMER_1_CM
priority 100240
class CUSTOMER_2_CM
priority 100240
class class-default
fair-queue
!
policy-map PARENT_SHAPER
class class-default
shape average 50480000
fair-queue
service-policy CHILD_SHAPER
!
interface GigabitEthernet0/0/11
description ISP
ip address ***.***.***.*** ***.***.***.***
service-policy output PARENT_SHAPER

BasV · ‎08-17-2021

Hello,

We don't have a fixed amount of bandwidth per customer. It also isn't really an option for us to manually create a rule for each customers as we currently have around 250 (each with their own IP) and it's still growing. We have 1024 IP addresses (of which around 250 are currently in use) so I'd have to create 1024 manual policies, I was not planning to do that.

That's why our intention is to not set static limits per IP, but allow all IPs to take all bandwidth. Except if for example 2 customers want to take full bandwidth, then I want each of them to be given a fair share.

Regards,

Bas

Georg Pauwen · ‎08-17-2021

Hello Bas,

understood. However, (obviously) if you have multiple customers competing for the same bandwidth, in order to somehow limit what each customer can get, some sort or classification needs to take place. 1024 classes indeed seems excessive, maybe you can group IP addresses together ?

That said, the below would just shape 500MB on your WAN link, with drops only occurring when these 500MB are used up. You don't need (or want) any input policing.

policy-map SHAPE_500_PM
class class-default
shape average 50480000
fair-queue
!
interface GigabitEthernet0/0/11
description ISP
ip address ***.***.***.*** ***.***.***.***
service-policy output SHAPE_500_PM

BasV · ‎08-17-2021

Hello,

Each client has it's own IP so I don't really think grouping would really help. Isn't it the case that by shaping, it'll share among different connections automatically? I think that will be sufficient.

Also, I would like to add the "fair-queue", but it doesn't recognise that command under the class-default.

I also just changed the shaping to policing because of the packet loss, and the packet loss seems a lot less now, even though we're not even close to 500Mbps, rather around 50-100Mbps. Any idea why that is? And would you recommend shaping or policing in our case?

The reason why we have input policing is because I hoped to somewhat control the inbound bandwidth with that, so that it'll drop packets to slow down inbound TCP traffic in case other customers need it.

Regards,

Bas

Georg Pauwen · ‎08-17-2021

Hello,

shaping is usually better than policing, because with policing, excess packets just get dropped, whereas with shaping, traffic bursts that exceed those 500MB get smoothed out and not automatically dropped.

That said, if you have 500MB, and you experience packet loss at 50 - 100 MB, you probably want to get in touch with your ISP. What if you shape to 100MB instead of 500 (using the sample config I suggested) ?

In order to test what you actually get, I would remove the entire service policy altogether and then use a simple online tool such as 'fast.com' to measure your speed. It wouldn't be the first time that an ISP tells you that you have 500MB, but in reality, it is less.

Also, can you post the output of:

show interfaces GigabitEthernet0/0/11

?

BasV · ‎08-18-2021

Hello,

The way I see it, it's pretty clear the shaping I set up didn't work properly, because with the shaping I would get packet loss, and with the policing I would not, so to me it seems like an issue in my shaping configuration pretty clearly, not the ISP.

I can't really do heavy bandwidth testing now during business hours as I don't want to slow down production traffic. Please see the output of the show interfaces command here, please note though that it's still on the policing:

GigabitEthernet0/0/11 is up, line protocol is up
  Hardware is 12xGE-2x10GE-FIXED, address is ****.****.**** (bia ****.****.****)
  Description: ISP
  Internet address is ***.***.***.***/**
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 18/255, rxload 11/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full Duplex, 1000Mbps, link type is auto, media type is RJ45
  output flow-control is unsupported, input flow-control is on
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:11, output 00:00:05, output hang never
  Last clearing of "show interface" counters 1d20h
  Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 2517078
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 45568000 bits/sec, 11192 packets/sec
  5 minute output rate 72696000 bits/sec, 12328 packets/sec
     1311631008 packets input, 638005081401 bytes, 0 no buffer
     Received 0 broadcasts (0 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 10 multicast, 0 pause input
     1363197283 packets output, 1185550316993 bytes, 0 underruns
     Output 2 broadcasts (0 IP multicasts)
     0 output errors, 0 collisions, 0 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

The thing that I notice is that the queueing strategy is fifo, but I belive it should be possible to set up a fair-queue too, I however was not able to configure that.

Regards,

Bas

Joseph W. Doherty · ‎08-18-2021

Could you clarify why you're policing ISP ingress? (Because being downstream, either your ISP has already enforced your 1/2 gig cap or they have let the traffic through. If the latter, if you received it, why discard it? [There can be reasons, for such policing, but depending on why your doing it, often you don't get the results you desire.])

Some shapers, I suspect, implicitly FQ outbound traffic, but to help insure that, what you might be able to configure is:

policy-map shaping-isp
class class-default
fair-queue
shape average 524288000

Some possible issues, with the shaping, include:

I suspect some devices don't "count" L2 overhead for policing or shaping, so you need to allow for it. Usually, shaping about 15% "slower" than you bandwidth caps covers that situation.

Using FQ and/or shaping may adjust maximum number of queued packets for the class as a whole and/or for each flow queue. (What happens generally varies per platform, and perhaps IOS version. Some platforms have class commands to allow you to adjust one or both of these queue max packets allowances.)

BTW, although you see packet loss when only 50 to 100 Mbps, that an average usage over some time period, often, by default, 5 minutes. If you have interior ingress interface than can aggregate over a gig to the egress interface, you can have queue overflows that happen within milliseconds. Too small maximum packets allowances, in queues, will, of course, overflow sooner.

PS:

Oh, forgot to mention, FQ doesn't actually allocate a queue per flow, it allocates a fixed number of flow queues and distributes active flows across them (somewhat, I believe, like Etherchannel). (Ideally, only one active flow is in any one flow queue.) I recall (?) some platforms also support a class command to adjust number of FQ flow queues allocated.

BasV · ‎08-19-2021

Hello,

The idea behind policing ingress traffic is because I hoped it would give each source or traffic flow / connection a fair part of the bandwidth, and drop the rest. So that with TCP connections it would slow down.

Setting up the fair-queue won't work:

RT01(config-pmap-c)#fair-queue
                    ^
% Invalid input detected at '^' marker.

This is under the class-default. So what's wrong here? Maybe the ASR920 doesn't support this?

Regards,

Bas

BasV · ‎08-19-2021

Hello,

I just tried setting it back to shaping (it was on policing temporarily), I set it to 425m (500 - 15%) and I did a speed test, other connections were immediately getting packet loss.

Regards,

Bas

BasV · ‎08-19-2021

I just tried something else. I set it back to shaping but removed the shape average, and set the bandwidth command as follows:

policy-map shaping-isp
 class class-default
  bandwidth 425000

Now I don't see a lot of drops during a speedtest. However, when I do a speedtest from 2 different customers / servers at a time, one gets 10Mbps and the other one gets 415Mbps. I hoped the router would give each of them a fair share.

We used to run pfSense and there we set up flexible limits as explained in this article: Flexible vs. Fixed Limiters & Troubleshooting with pfSense 2.2.x : PFSENSE

I really hope something like that will be possible with the Cisco routers.

Joseph W. Doherty · ‎08-19-2021

"Now I don't see a lot of drops during a speedtest."

That's not unexpected as you're now should be using all the port's gig bandwidth. Bottleneck would be with your service provider. Don't know how they treat overrate traffic. It's possible whatever they are doing accounts for your unequal bandwidth usages for your two customers / servers.

With:

policy-map shaping-isp
 class class-default
  bandwidth 425000

Such a class configuration has one single FIFO class queue, which will only queue traffic that exceeds port speed. A class bandwidth statement sets a minimum bandwidth guarantee (actually a dequeuing ratio relative to other classes).

If your link has a service provider cap of 500 Mbps, you need to shape if you want to manage possible congestion at 500 Mbps on a gig interface.

I've previously posted what your policy-map class configuration might be like, with additional usage notes.

BasV · ‎08-23-2021

Hello,

We have 2 ISPs (we have 2 of those routers but to make it simpler, I only gave the details needed to solve the issue, I can easilly replicate the solution to the 2nd router (and ISP) later). I don't want to rely on ISP traffic shaping because then we also don't have the freedom to move to other ISPs.

We announce our own IP space with BGP. Each customer has it's own public IP in that IP space on the LAN side of the router.

I obviously don't want a single FIFO queue. I ran tests for a while but still, with any configuration, I am getting packet loss, more than what we used to have when we set up the dynamic queueing in our old pfSense.

The policy-map you posted previously had the fair-queue command in it, but that doesn't work as I explained, I believe that's the most important part of it...

If there is really no other way, I will have to write a script to create a policy map with each IP seperately in a class I believe. Is this really a good way to do it? That means I'll have 1024 rules, is that really how it should be done?

Regards,

Bas

Georg Pauwen · ‎08-23-2021

Hello Bas,

if you really want to get to that level of granularity and limit each individual IP address, 1024 classes appear to be the only option.

I think there are (third party) bandwidth management solutions out there, not sure if Cisco Prime can do it. They are rather expensive as far as I recall.

Joseph W. Doherty · ‎08-23-2021

Sorry for the delayed response - somehow missed your 8/23 reply.

"The policy-map you posted previously had the fair-queue command in it, but that doesn't work as I explained, I believe that's the most important part of it..."

Could you again explain further how you're sure FQ doesn't/didn't work?

Reason I ask, there lots of possible issues beyond FQ not working correctly that could explain your test results. For example, if you're using TCP based flow tests any packet drops on one flow can impact transmission rate, and unless you're monitoring packets sent and received, and their timings, you cannot tell if something is happening to a flow's packets within the ISP cloud (BTW, not saying that's the case here, but there's been quite a few times I've had to "prove" an issue within a provider's network when they said they didn't have one. "Worst" example was with a tier one provider which took months for them to "find" it since there "wasn't a problem" in their network. Turned out to be old/buggy firmware on one of their interface modules which introduced a very small drop percentage when port utilization neared 100%.) Further, with TCP flows, things like how the ACKs are being handled and/or RWIN on receiving hosts, come into play, etc.

"I obviously don't want a single FIFO queue. I ran tests for a while but still, with any configuration, I am getting packet loss, more than what we used to have when we set up the dynamic queueing in our old pfSense."

Alas, that might be true. FQ has its own defaults, which if the platform does not allow you to override, can possibly introduce more drops due to insufficiently sized queues. At least their global FIFO queue, has, generally, for years, supported adjusting its possible size.

BTW, even when there's software based queues, they only handle traffic that overflows the interface's TX ring queue, which is FIFO. Sometimes to get the best behavior with the software based queues you need to downsize the TX ring queue. (I often downsize it to about the minimum, but that can create other issues.)

But, again, unclear a) FQ, on your platform, is working incorrectly and b) that it cannot be "adjusted/tweaked" for your needs.

I will say, on the whole, Cisco platform seem to often work, as they're supposed to, more often than some other network vendors, and also, on the whole, when they don't, they often fix their bugs.