05-11-2023 12:07 PM
Hi,
I am trying to troubleshoot an issue we're having with the 9k in our network and while I was able to gain some understanding, I need some help. Our network is a transit network, that is we provide l2vpn services (VPWS or VPLS) to our customers. Some of them are attached to our PEs (ASR 9k) through a l2 bundle, some of those PE go back towards core with a L3 bundle.
One of the issue was that we often noticed the bundle usage was VERY asymmetrical, this was eventually chalked down to the fact that there was no "l2vpn load-balancing flow src-dst-X" CLI configured.
Without this CLI it seems that the router won't attempt any kind of load balancing of the traffic, even in the presence of ECMP paths, it will only ever use one link. Only after that CLI is entered we'll see the router actually balance traffic. This seems to hold true wheter the interface(s) towards the backbone are a L3 p2p link or an aggregate.
The problem we're facing is when there's not enough variance in src/dst mac or IP. In a test environment, I set up 2 ASR9k boxes that have a 2x10g bundle ethernet on a A9K-24X10GE-1G-SE and each also has a A9K-8HG-FLEX-SE LC. To one of the interfaces of the latter LC I have attached an IXIA chassis for traffic generation.
Right now I am trying to send 15gb of traffic (single Source/Dest IP and MAC) across the 20gb bundle but I cannot.
What I've observed is the following:
-Without the above CLI configured, the router will not even attempt to load-balance outgoing traffic towards the backbone Only one link will be utilized to handle all of the incoming and outgoing traffic. This behavior was consistent either when the two interfaces were members of a bundle or when I utilized each one as a p2p L3 link toward the other router:
Interface In(bps) Out(bps) InBytes/Delta OutBytes/Delta
Te0/1/0/13 9.8G/ 98% 9.8G/ 98% 1.8T/2.4G 14.7T/2.4G
Te0/1/0/14 1000/ 0% 1000/ 0% 13.1T/538 241.1G/308
-Once we enter the CLI above even on just one of the two 9k, the behavior will change. Now the bundle will start utilizing one interface to handle all of the incoming traffic and one to handle all of the outgoing:
Interface In(bps) Out(bps) InBytes/Delta OutBytes/Delta
Te0/1/0/13 1000/ 0% 9.8G/ 98% 1.2T/308 11.4T/2.4G
Te0/1/0/14 9.8G/ 98% 1000/ 0% 10.4T/2.4G 241.1G/577
So after entering the CLI, the box is trying to do some load balancing as in its utilizing one of the links to handle all of the outgoing traffic and one of the links to handle all of the incoming, rather than utilizing just one for both. However, since we're still trying to send traffic that exceeds the capacity of a single member of the bundle, there are drops.
Is it possible in such a scenario to force the box to utilize the bandwidth that is available in the bundle in a more effective way?
05-11-2023 12:13 PM
Hi Thomas,
It seems like you have already identified the issue with asymmetrical traffic and have configured the "l2vpn load-balancing flow src-dst-X" command to achieve some level of load balancing. However, you are still experiencing drops when trying to send traffic that exceeds the capacity of a single member of the bundle.
In this scenario, one possible solution could be to implement Equal-Cost Multipath (ECMP) routing. ECMP allows traffic to be distributed across multiple links that have the same cost. This way, traffic can be load balanced across multiple links, which can help to maximize the use of the available bandwidth.
To implement ECMP, you can configure multiple equal-cost paths in your routing protocol. For example, if you are using OSPF, you can configure multiple equal-cost paths using the "maximum-paths" command. You can also configure load balancing on the router using the "ip cef load-sharing algorithm" command, which determines how traffic is distributed across multiple equal-cost paths.
It's important to note that ECMP may not be suitable for all network topologies and may require careful planning and testing to ensure that it is configured correctly. Additionally, ECMP may not be supported on all platforms or versions of IOS-XE, so you should check the documentation for your specific platform to confirm whether it is supported.
Please rate any helpful comments.
-MAK
05-11-2023 01:22 PM
ECMP is enabled by default upto 32 ways in older code and 64 ways in newer code. That is not the problem, this issue reported is a common TAC scenario we see with mpls network and especially l2vpn traffic.
Sam
05-11-2023 01:21 PM
An important note here is that on the ASR9K we do load balancing per flow and not per packet which means that the traffic going across is for the same flow then it will always be hashed to the same link which would cause an uneven distribution. For example this is mainly seen on tunneled traffic where the outer packet encapsulation will always be destined to the same PE router hence we will hash it to the same link.
If the traffic is MPLS traffic which includes pseudowires, then enabling FAT (flow labels) will help add diversity to the hashing. Also if we know that the source address is more diversed than the destination then we can change the hashing to use the source rather than the destination. (This is your problem, the outer headers have the same src and dst ip and mac so they always hash the same, and likely you have the same labels or not enough variance in labels which is where FAT or control-word comes in, in addition you can set the l2vpn load-balancing as you did which will also help, but due to the nature of encapsulated traffic and mpls and the outer information being the same you will see this behavior until you modify default hashing on the box in some manor).
Another solution is to change the CEF load balancing algorithm. We have several hashing profiles (up to 32) you can configure to see which option will more even your traffic distribution.
Example:
config
cef load-balancing algorithm adjust <value>
commit
end
(if you would like to change the cef load balancing just keep in mind that it is a global command so the traffic distribution will change on all interfaces in the system. It is not service impacting as it only changes how we hash things out)
As mentioned before, our load balancing is per flow and not per packet so it all depends on the traffic that should be flowing out how would we treat it.
Let me share with you the following document that describes in excellent detail the above scenarios and why we see traffic polarization on each:
ASR9000/XR: Load-balancing architecture and characteristics
https://supportforums.cisco.com/t5/service-providers-documents/asr9000-xr-load-balancing-architecture-and-characteristics/ta-p/3124809
Thanks,
Sam
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide