cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1022
Views
30
Helpful
5
Replies

Traffic Shaping design and advise

Hello Experts @Leo Laohoo  @Rob Ingram  @balaji.bandi 

I have SDWAN implemented and working to design traffic shaping policy.

There are two parts, one is the WAN policy from the underlay for Internet access which covers policies like teams, zoom, SIP-Voip, and other UDP base access.  Do we need any policy for Outlook or any other stuff? 

2nd part is Overlay Internal applications like internal access to phone servers from phones, particular application access to internal servers etc.

Please provide suggestions to design and implement it. Key things to include. 

How to leverage new/more applications in future without disturbing the existing setup.

Thanks

 

 

1 Accepted Solution

Accepted Solutions

Joseph W. Doherty
Hall of Fame
Hall of Fame

As you didn't mention me as an expert (even if I am a QoS expert legend in my own mind - laugh), hope you don't mind if I inject my 2 bits worth.

Firstly, unclear what/how you see using traffic shaping.  If you have something in mind such as shaping some traffic to preserve bandwidth for other traffic, that can be done, but as shaping limits bandwidth all the time, I prefer prioritizing bandwidth based on application needs.

The only time, I usually shape traffic, is when I'm dealing with some QoS control point, which has more immediate (e.g. port) bandwidth than I know is available further along the path (e.g. CIR less than port bandwidth).  Such a shaper creates a bandwidth restriction where I can apply QoS effectively for the path bandwidth.

You mention Internet access.  Of course, the Internet doesn't generally honor/support QoS bandwidth management.

Downstream QoS bandwidth management (like receiving traffic from other Internet sites you have no control over), can be somewhat effective, but generally requires a "special" device like (now defunct?) Packeteer devices.

Site-to-site QoS bandwidth management across the Internet can be (almost) as effective as having a "private" WAN.  However, much like a private WAN, you need to be able to fully manage traffic between your Internet connected sites.

For example, between two of your sites, connected via Internet, you should be (mostly) able to deliver necessary performance for Zoom, VoIP, etc.

Between one of your Internet connected sites, and another Internet connected site, which you have no bandwidth management control, without a "special" appliance, service levels are often practically impossible to provide, and even with a "special" appliance, you cannot guarantee service levels at the same level as if you control both Internet ends.  (Again, as the Internet doesn't guarantee any service level, you cannot reach the same guarantee level as with a private WAN.)

"How to leverage new/more applications in future without disturbing the existing setup."

That's best accomplished, I believe, by using a very generic QoS model, where you only need to correctly direct traffic to the "right" class and insure you have the bandwidth actually needed to support your classes.

My generic model is:

policy-map GenericModel
class real-time !e.g. VoIP bearer, video conferencing (apps with "known" bandwidth usage limits)
priority percent 35
class Hi-priority !e.g. VoIP signaling (apps with "known" bandwidth usage limits)
bandwidth remaining percent 81
fair-queue
class Low-priority !e.g. scavenger, bulk data replication (e.g. database copies, email [server-to-server] transfers)
bandwidth remaining percent 1
fair-queue
class class-default !e.g. most all traffic, i.e. none of the above
bandwidth remaining percent 9
fair-queue

I've posted the above models, many times over the years, what I may not have mentioned, is when I used such a model, I would adopt techniques to try to determine what level traffic should be in without or ignoring the "classical" method, i.e. put this kind of traffic here and that kind of traffic there.

For example, suppose you have both HTTP and FTP traffic.  "Classically", as HTTP is likely interactive it's "better" than FTP traffic for bandwidth prioritization.  But, consider, if a HTTP session is downloading some 100 MB app update file vs. a FTP session is downloading a readme.txt file containing 100 characters, which is really "better", especially if the FTP is being used interactively.

Or, consider two users using telnet.  Both users interacting with a Cisco router connected to the Internet.  One user is doing "typical" console interaction, the other has just set no screen pagination, and is listing the whole Internet router table (while logging their telnet session output).  Should these two users get exactly the same service level guarantees?  (Personally, I believe they should not.)

So, I believe a dynamically driven service policy, based on bandwidth usage, avoids many problems of how to treat different applications, both now, and into the future.  (One of the reasons I really, really like FQ/WFQ.)

It's also somewhat nice when you start dealing with crypto traffic.  I cannot see "what" your traffic is, but I can "see" how demanding you are for bandwidth.

To be clear, being dynamic doesn't totally eliminate using application type and/or using ToS tags, but it does, I've found, simplify policy management.  E.g. if desired, knowing both one app is telnet and the other FTP and also knowing that telnet flow is moving lots and lots of data, and the FTP flow very little, I might prioritize this particular FTP flow over the telnet flow.

View solution in original post

5 Replies 5

Joseph W. Doherty
Hall of Fame
Hall of Fame

As you didn't mention me as an expert (even if I am a QoS expert legend in my own mind - laugh), hope you don't mind if I inject my 2 bits worth.

Firstly, unclear what/how you see using traffic shaping.  If you have something in mind such as shaping some traffic to preserve bandwidth for other traffic, that can be done, but as shaping limits bandwidth all the time, I prefer prioritizing bandwidth based on application needs.

The only time, I usually shape traffic, is when I'm dealing with some QoS control point, which has more immediate (e.g. port) bandwidth than I know is available further along the path (e.g. CIR less than port bandwidth).  Such a shaper creates a bandwidth restriction where I can apply QoS effectively for the path bandwidth.

You mention Internet access.  Of course, the Internet doesn't generally honor/support QoS bandwidth management.

Downstream QoS bandwidth management (like receiving traffic from other Internet sites you have no control over), can be somewhat effective, but generally requires a "special" device like (now defunct?) Packeteer devices.

Site-to-site QoS bandwidth management across the Internet can be (almost) as effective as having a "private" WAN.  However, much like a private WAN, you need to be able to fully manage traffic between your Internet connected sites.

For example, between two of your sites, connected via Internet, you should be (mostly) able to deliver necessary performance for Zoom, VoIP, etc.

Between one of your Internet connected sites, and another Internet connected site, which you have no bandwidth management control, without a "special" appliance, service levels are often practically impossible to provide, and even with a "special" appliance, you cannot guarantee service levels at the same level as if you control both Internet ends.  (Again, as the Internet doesn't guarantee any service level, you cannot reach the same guarantee level as with a private WAN.)

"How to leverage new/more applications in future without disturbing the existing setup."

That's best accomplished, I believe, by using a very generic QoS model, where you only need to correctly direct traffic to the "right" class and insure you have the bandwidth actually needed to support your classes.

My generic model is:

policy-map GenericModel
class real-time !e.g. VoIP bearer, video conferencing (apps with "known" bandwidth usage limits)
priority percent 35
class Hi-priority !e.g. VoIP signaling (apps with "known" bandwidth usage limits)
bandwidth remaining percent 81
fair-queue
class Low-priority !e.g. scavenger, bulk data replication (e.g. database copies, email [server-to-server] transfers)
bandwidth remaining percent 1
fair-queue
class class-default !e.g. most all traffic, i.e. none of the above
bandwidth remaining percent 9
fair-queue

I've posted the above models, many times over the years, what I may not have mentioned, is when I used such a model, I would adopt techniques to try to determine what level traffic should be in without or ignoring the "classical" method, i.e. put this kind of traffic here and that kind of traffic there.

For example, suppose you have both HTTP and FTP traffic.  "Classically", as HTTP is likely interactive it's "better" than FTP traffic for bandwidth prioritization.  But, consider, if a HTTP session is downloading some 100 MB app update file vs. a FTP session is downloading a readme.txt file containing 100 characters, which is really "better", especially if the FTP is being used interactively.

Or, consider two users using telnet.  Both users interacting with a Cisco router connected to the Internet.  One user is doing "typical" console interaction, the other has just set no screen pagination, and is listing the whole Internet router table (while logging their telnet session output).  Should these two users get exactly the same service level guarantees?  (Personally, I believe they should not.)

So, I believe a dynamically driven service policy, based on bandwidth usage, avoids many problems of how to treat different applications, both now, and into the future.  (One of the reasons I really, really like FQ/WFQ.)

It's also somewhat nice when you start dealing with crypto traffic.  I cannot see "what" your traffic is, but I can "see" how demanding you are for bandwidth.

To be clear, being dynamic doesn't totally eliminate using application type and/or using ToS tags, but it does, I've found, simplify policy management.  E.g. if desired, knowing both one app is telnet and the other FTP and also knowing that telnet flow is moving lots and lots of data, and the FTP flow very little, I might prioritize this particular FTP flow over the telnet flow.

For me you are the best in QoS.  

Well thank you @MHM Cisco World.  That's very kind.

Actually, unlikely I'm the "best" in QoS, but having worked QoS extensively, in a good sized international WAN, I believe I figured out a few things, which don't seem to be well presented in typical QoS materials.

Also, actually, much of my (personal) underlying QoS "theory" is plagiarized, not from QoS materials, but from computer resource management, especially sharing CPU and/or disk drives.  Is some ways, apps sharing bandwidth is like apps sharing CPU.  Yes, often some apps are prioritized over other apps, but how do (more-or-less) equal priority apps share CPU?

Way, way back, FIFO was used for some CPU sharing, but you got very unpredictable, per particular app instance, results (much like "typical" FIFO sharing of network bandwidth even still used today).

For years now (decades actually), CPU sharing is often much more involved than simple FIFO and or simple PQ, yet, again, what's the default queuing for most interfaces, FIFO.  Or, yea, we can PQ VoIP, but beyond that, we now have 12-class models, and/or optionally 3 priority drop levels, but if your "pipe" has more than 70% utilization, how do your users perceive network performance?

Something I don't recall posting on these forums, is where I would argue, for QoS, saying, when someone works with a 1 KB file across a network versus working with a 50 MB across the network, they expect the former to take less time than the latter.  And, it does, assuming two users aren't doing both things concurrently.  But, if two users are doing both things concurrently, sometimes the 50 MB user interaction will "crush" the 1 KB user interaction.  This is the kind of situation QoS can preclude.

BTW, years ago, there were studies where predictability and/or consistency were very important to users using a system.  I.e. a consistent 5 second response was better than a 1 second response most of the time, but with occasional 30 second responses.  I.e. if you had to, you would artificially slow the 1 second response to 5 seconds so as to guarantee no 30 second responses.

I can see the logic of this, but personally, I would question why there are 30 second responses and work to mitigate them.  To me, if most responses could still be 1 second, with an occasional 3 second response, I think that better than all responses being 5 seconds.

Laugh, but I also believe and say computer systems should wait on people, people should not wait on computer systems.

Something I believe I've posted before is, in the business where I had implemented QoS, the WAN lead considered QoS just voodoo.  Well, until the day, one WAN core router rebooted itself over night, and dropped its QoS configuration.  The WAN lead, later said, that morning when he came in, his telephone was lit like a Christmas tree, remote sites complaining, what's wrong with the network?

He discovered the WAN router that had rebooted, also discovered its missing QoS statements, reapplied them, and the telephone calls stopped! (BTW, this was before we were running VoIP or video conferencing apps - did add them a few years later, no big deal.)

I also believe I posted this before, this international company was split into 3 semi-autonomous regions.  Our region (only one with QoS) didn't have remote users constantly complaining about network performance.  Further, the two other regions, were constantly upgrading WAN bandwidths, which "surprisingly" (to them) didn't seem to stop the continuous network performance complaints within those two other regions.

I did try to convince the two other regions to try QoS, but they felt is wasn't needed, as (again) at that time we weren't supporting any real-time traffic like VoIP, and "we know" you only really need to consider QoS when supporting VoIP, etc.

(Oh, yea, and when we did start to support VoIP and video conferencing, those other two regions did implement QoS.  Using what I worked up over 5 years?  No, let's copy examples from Cisco [?], that should be what we need.  [Cisco, at that time, I believe was still using their Olympic model.  Well, LLQ/PQ for real-time traffic was good, although believe their users still complained about their other network app performance, but it wasn't any/much worse, so not really a problem.]  They did allow our region to continue to use our own QoS policy.)

Sorry, just realized I'm on my QoS soapbox.  Why stop now . . .

For years, "we need" FE, gig, 5gig, 10gig, 25gig, 40gig, 100gig, and ???, but do we need QoS?

"Well maybe for VoIP and/or video conferencing".

Oh, and if you do use QoS, here's our 12-class model where you provide 11% bandwidth for this class, and 2% bandwidth for that class, and use RED on this class beginning drops at 112 packets with 100% drops at 838 packets if in the 2nd drop probability class, but for this class . . .  (Laugh, and I wonder why QoS is considered complex and/or voodoo.)

Anyway, possibly no surprise, but I believe QoS is very much underappreciated for what it can do (also much under supported by current technology).

Hello Joseph 


@Joseph W. Doherty wrote:

which don't seem to be well presented in typical QoS materials.

Oh, and if you do use QoS, here's our 12-class model where you provide 11% bandwidth for this class, and 2% bandwidth for that class, and use RED on this class beginning drops at 112 packets with 100% drops at 838 packets if in the 2nd drop probability class, but for this class . .



So true,,,, I was reading a qos tech note last night and I nearly fell asleep, it was so ambiguous 

That same query I posted not so long back you assisted with , I also sent to our cisco tac, first ever time for a qos query and unbelievably to my surprise they weren't that very helpful, its was as if its a subject they didn't want to get mithered with but I knew if i came on here there would be a high probability at least id receive some good advise which I could work with.

So agree with @MHM Cisco World  your knowledge of the subject seems second to none, And were are glad you support these forums with you expertise mate.


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

@paul driver, thank you too for your kind words.

Yea, come to reflect on it, usually found TAC ineffective in providing help or information about QoS.  I suspect, they just don't have either the information or knowledge, at hand, about QoS.  (BTW, I don't doubt, somewhere in the bowels of Cisco, there are some real QoS experts; just not in the TAC.)

Much of my knowledge about the actual workings of Cisco QoS has been by "black box" observations, or QoS nuggets buried in Tech Notes, Whitepapers, Cisco Live presentations or sometimes, info about new release features documenting why the "new" is better than the "old".

For example, early on, working out actually useful Cisco QoS, I wasn't getting the results I expected/desired.  Found a Tech Note, concerning tuning tx-ring-limit on ATM interfaces (this one, I recall: Understanding and Tuning the tx-ring-limit Value ) (NB: BTW, I wasn't using ATM interfaces, though).  Note described how you might need to adjust (down) tx-ring-limit size, since it's an interface FIFO queue.  Reduced my interface's tx-ring sizes, QoS started to deliver expected/desired results!

(Why this documented with ATM?  My guess, ATM, did have a design aim of doing things like voice over it, i.e. possibly service level management more of a real concern.)

I could go on and on, but likely have sidetracked this thread already too much.

Apologies to OP, and, again, if you can further express how you see using traffic shaping, might be able to provide more specific recommendations.

Review Cisco Networking for a $25 gift card