on 08-28-2012 07:08 AM - edited on 11-22-2022 08:23 AM by nkarpysh
In this document it is discussed how the ASR9000 decides how to take multiple paths when it can load-balance. This includes IPv4, IPv6 and both ECMP and Bundle/LAG/Etherchannel scenarios in both L2 and L3 environments
The load-balancing architecture of the ASR9000 might be a bit complex due to the 2 stage forwarding the platform has. In this article the various scenarios should explain how a load-balancing decision is made so you can architect your network around it.
In this document it is assumed that you are running XR 4.1 at minimum (the XR 3.9.X will not be discussed) and where applicable XR42 enhancements are alerted.
ASR9000 has the following load-balancing characteristics:
The way they tie together is shown in this simplified L3 forwarding model:
NRLDI = Non Recursive Load Distribution Index
RLDI = Recursive Load Distribution Index
ADJ = Adjancency (forwarding information)
LAG = Link Aggregation, eg Etherchannel or Bundle-Ether interface
OIF = Outgoing InterFace, eg a physical interface like G0/0/0/0 or Te0/1/0/3
What this picture shows you is that a Recursive BGP route can have 8 different paths, pointing to 32 potential IGP ways to get to that BGP next hop, and EACH of those 32 IGP paths can be a bundle which could consist of 64 members each!
The architecture of the ASR9000 load-balancing implementation surrounds around the fact that the load-balancing decision is made on the INGRESS linecard.
This ensures that we ONLY send the traffic to that LC, path or member that is actually going to forward the traffic.
The following picture shows that:
In this diagram, let's assume there are 2 paths via the PATH-1 on LC2 and a second path via a Bundle with 2 members on different linecards.
(note this is a bit extraordinary considering that equal cost paths can't be mathematically created by a 2 member bundle and a single physical interface)
The Ingress NPU on the LC1 determines based on the hash computation that PATH1 is going to forward the traffic, then traffic is sent to LC2 only.
If the ingress NPU determines that PATH2 is to be chosen, the bundle-ether, then the LAG (link aggregation) selector points directly to the member and traffic is only sent to the NP on that linecard of that member that is going to forward the traffic.
Based on the forwarding achitecture you can see that the adj points to a bundle which can have multiple members.
Allowing this model, when there are lag table udpates (members appearing/disappearing) do NOT require a FIB update at all!!!
In order to determine which path (ECMP) or member (LAG) to choose, the system computes a hash. Certain bits out of this hash are used to identify member or path to be taken.
8-way recursive means that we are using 3 bits out of that hash result
32-way non recursive means that we are using 5 bits
64 members means that we are looking at 6 bits out of that hash result
It is system defined, by load-balancing type (recursive, non-recursive or bundle member selection) which bits we are looking at for the load-balancing decision.
What is fed into the HASH depends on the scenario:
Incoming Traffic Type | Load-balancing Parameters |
---|---|
IPv4 |
Source IP, Destination IP, Source port (TCP/UDP only), Destination port (TCP/UDP only), Router ID |
IPv6 |
Source IP, Destination IP, Source port (TCP/UDP only), Destination port (TCP/UDP only), Router ID |
MPLS - IP Payload, with < 4 labels |
Source IP, Destination IP, Source port (TCP/UDP only), Destination port (TCP/UDP only), Router ID |
From 6.2.3 onwards, for Tomahawk + later ASR9K LCs: MPLS - IP Payload, with < 8 labels |
Source IP, Destination IP, Source port (TCP/UDP only), Destination port (TCP/UDP only), Router ID Typhoon LCs retain the original behaviour of supporting IP hashing for only up to 4 labels.
|
MPLS - IP Payload, with > 9 labels |
If 9 or more labels are present, MPLS hashing will be performed on labels 3, 4, and 5 (labels 7, 8, and 9 from 7.1.2 onwards). Typhoon LCs retain the original behaviour of supporting IP hashing for only up to 4 labels. |
- IP Payload, with > 4 labels |
4th MPLS Label (or Inner most) and Router ID |
- Non-IP Payload |
Inner most MPLS Label and Router ID |
* Non IP Payload includes an Ethernet interworking, generally seen on Ethernet Attachment Circuits running VPLS/VPWS.
These have a construction of
EtherHeader-Mpls(next hop label)-Mpls(pseudowire label)-etherheader-InnerIP
In those scenarios the system will use the MPLS based case with non ip payload.
IP Payload in MPLS is a common case for IP based MPLS switching on LSR's whereby after the inner label an IP header is found directly.
The router ID is a value taken from an interface address in the system in an order to attempt to provide some per node variation
This value is determined at boot time only and what the system is looking for is determined by:
sh arm router-ids
Example:
RP/0/RSP0/CPU0:A9K-BNG#show arm router-id
Tue Aug 28 11:51:50.291 EDT
Router-ID Interface
8.8.8.8 Loopback0
RP/0/RSP0/CPU0:A9K-BNG#
This section is specific to bundles. A bundle can either be an AC or attachment circuit, or it can be used to route over.
Depending on how the bundle ether is used, different hash field calculations may apply.
When the bundle ether interface has an IP address configured, then we follow the ECMP load-balancing scheme provided above.
When the bundle ether is used as an attachment circuit, that means it has the "l2transport" keyword associated with it and is used in an xconnect or bridge-domain configuration, by default L2 based balancing is used. That is Source and Destination MAC with Router ID.
If you have 2 routers on each end of the AC's, then the MAC's are not varying a lot, that is not at all, then you may want to revert to L3 based balancing which can be configured on the l2vpn configuration:
RP/0/RSP0/CPU0:A9K-BNG#configure
RP/0/RSP0/CPU0:A9K-BNG(config)#l2vpn
RP/0/RSP0/CPU0:A9K-BNG(config-l2vpn)#load-balancing flow ?
src-dst-ip Use source and destination IP addresses for hashing
src-dst-mac Use source and destination MAC addresses for hashing
In this case the bundle ether has a configuration similar to
interface bundle-ether 100.2 l2transport
encap dot1q 2
rewrite ingress tag pop 1 symmetric
And the associated L2VPN configuration such as:
l2vpn
bridge group BG
bridge-domain BD
interface bundle-e100.2
In the downstream direction by default we are load-balancing with the L2 information, unless the load-balancing flow src-dest-ip is configured.
The attachment circuit in this case doesn't really matter, whether it is bundle or single interface.
The associated configuration for this in the L2VPN is:
l2vpn
bridge group BG
bridge-domain BD
interface bundle-e100.2
vfi MY_VFI
neighbor 1.1.1.1 pw-id 2
interface bundle-ether 200
ipv4 add 192.168.1.1 255.255.255.0
router static
address-family ipv4 unicast
1.1.1.1/32 192.168.1.2
In this case neighbor 1.1.1.1 is found via routing which appens to be egress out of our bundle Ethernet interface.
This is MPLS encapped (PW) and therefore we will use MPLS based load-balancing.
In this scenario we are just routing out the bundle Ethernet interface because our ADJ tells us so (as defined by the routing).
Config:
interface bundle-ether 200
ipv4 add 200.200.1.1 255.255.255.0
show route (OSPF inter area route)
O IA 49.1.1.0/24 [110/2] via 200.200.1.2, 2w4d, Bundle-Ether200
Even if this bundle-ether is MPLS enabled and we assign a label to get to the next hop or do label swapping, in this case
the Ether header followed by MPLS header has Directly IP Behind it.
We will be able to do L3 load-balancing in that case as per chart above.
As attempted to be highlighted throughout this technote the load-balacning in MPLS scenarios, whether that be based on MPLS label or IP is dependent on the inner encapsulation.
Depicted in the diagram below, we have an Ethernet frame with IP going into a pseudo wire switched through the LSR (P router) down to the remote PE.
Pseudowires in this case are encapsulating the complete frame (with ether header) into mpls with an ether header for the next hop from the PE left router to the LSR in the middle.
Although the number of labels is LESS then 4. AND there is IP available, the system can't skip beyond the ether header and read the IP and therefore falls back to MPLS label based load-balancing.
How does system differentiate between an IP header after the inner most label vs non IP is explained here:
Just to recap, the MPLS header looks like this:
Now the important part of this picture is that this shows MPLS-IP. In the VPLS/VPWS case this "GREEN" field is likely start with Ethernet headers.
Because hardware forwarding devices are limited in the number of PPS they can handle, and this is a direct equivalent to the number of instructions that are needed to process a packet, we want to make sure we can work with a packet in the LEAST number of instructions possible.
In order to comply with that thought process, we check the first nibble following the MPLS header and if that starts with a 4 (ipv4) or a 6 (ipv6) we ASSUME that this is an IP header and we'll interpret the data following as an IP header deriving the L3 source and destination.
Now this works great in the majority scenarios, because hey let's be honest, MAC addresses for the longest time started with 00-0......
in other words not a 4 or 6 and we'd default to MPLS based balancing, something that we wanted for VPLS/VPWS.
However, these days we see mac addresses that are not starting with zero's anymore and in fact 4's or 6's are seen!
This fools the system to believe that the inner packet is IP, while it is an Ether header in reality.
There is no good way to classify an ip header with a limited number of instruction cycles that would not affect performance.
In an ideal world you'd want to use an MD5 hash and all the checks possible to make the perfect decision.
Reality is different and no one wants to pay the price for it either what it would cost to design ASICS that can do high performance without affecting the PPS rate due to a very very comprehensive check of tests.
Bottom line is that if your DMAC starts with a 4 or 6 you have a situation.
Use the MPLS control word.
Control word is negotiated end to end and inserts a special 4 bytes with zero's especially to accommodate this purpose.
The system will now read a 0 instead of a 4 or 6 and default to MPLS based balancing.
to enable control word use the follow template:
l2vpn
pw-class CW
encapsulation mpls
control-word
!
!
xconnect group TEST
p2p TEST_PW
interface GigabitEthernet0/0/0/0
neighbor 1.1.1.1 pw-id 100
pw-class CW
!
!
!
!
Since you might have little control over the inner label, the PW label, and you probably want to ensure some sort of load-balancing, especially on P routers that have no knowledge over the offered service or mpls packets it transports another solution is available known as FAT Pseudowire.
FAT PW inserts a "flow label" whereby the label has a value that is computed like a hash to provide some hop by hop variation and more granular load-balancing. Special care is taken into consideration that there is variation (based on the l2vpn command, see below) and that no reserved values are generated and also don't collide with allocated label values.
Fat PW is supported starting XR 4.2.1 on both Trident and Typhoon based linecards. From 6.5.1 onward we support FAT label over PWHE.
The following is configuration example :
l2vpn
load-balancing flow src-dst-ip
pw-class test
encapsulation mpls
load-balancing
flow-label both static
!
!
You can also affect the way that the flow label is computed:
Under L2VPN configuration, use the “load-balancing flow” configuration command to determine how the flow label is generated:
l2vpn
load-balancing flow src-dst-mac
This is the default configuration, and will cause the NP to build the flow label from the source and destination MAC addresses in each frame.
l2vpn
load-balancing flow src-dst-ip
This is the recommended configuration, and will cause the NP to build the flow label from the source and destination IP addresses in each frame.
Flow Aware Label (FAT) PW signalled sub-tlv id is currently carrying value 0x11 as specified originally in draft draft-ietf-pwe3-fat-pw. This value has been recently corrected in the draft and should be 0x17. Value 0x17 is the flow label sub-TLV identifier assigned by IANA.
When Inter operating between XR versions 4.3.1 and earlier, with XR version 4.3.2 and later. All XR releases 4.3.1 and prior that support FAT
PW will default to value 0x11. All XR releases 4.3.2 and later default to value 0x17.
Solution:
Use the following config on XR version 4.3.2 and later to configure the sub-tlv id
pw-class <pw-name>
encapsulation mpls
load-balancing
flow-label both
flow-label code 17
NOTE: Got a lot of questions regarding the confusion about the statement of 0x11 to 0x17 change (as driven by IANA) and the config requirement for number 17 in this example.
The crux is that the flow label code is configured DECIMAL, and the IANA/DRAFT numbers mentioned are HEX.
So 0x11, the old value is 17 decimal, which indeed is very similar to 0x17 which is the new IANA assigned number. Very annoying, thank IANA
(or we could have made the knob in hex I guess )
In the case of VPWS or VPLS, at the ingress PE side, it’s possible to change the load-balance upstream to MPLS Core in three different ways:
1. At the L2VPN sub-configuration mode with “load-balancing flow” command with the following options:
RP/0/RSP1/CPU0:ASR9000(config-l2vpn)# load-balancing flow ?
src-dst-ip
src-dst-mac [default]
2. At the pw-class sub-configuration mode with “load-balancing” command with the following options:
RP/0/RSP1/CPU0:ASR9000(config-l2vpn-pwc-mpls-load-bal)#?
flow-label [see FAT Pseudowire section]
pw-label [per-VC load balance]
3. At the Bundle interface sub-configuration mode with “bundle load-balancing hash” command with the following options:
RP/0/RSP1/CPU0:ASR9000(config-if)#bundle load-balancing hash ? [For default, see previous sections]
dst-ip
src-ip
It’s important to not only understand these commands but also that: 1 is weaker than 2 which is weaker than 3.
Example:
l2vpn
load-balancing flow src-dst-ip
pw-class FAT
encapsulation mpls
control-word
transport-mode ethernet
load-balancing
pw-label
flow-label both static
interface Bundle-Ether1
(...)
bundle load-balancing hash dst-ip
Because of the priorities, on the egress side of the ingress PE (to the MPLS Core), we will do per-dst-ip load-balance (3).
If the bundle-specific configuration is removed, we will do per-VC load-balance (2).
If the pw-class load-balance configuration is removed, we will do per-src-dst-ip load-balance (1).
with thanks to Bruno Oliveira for this priority section
Only one bundle member will be selected to forward traffic on the P2MP MPLS TE mid-point node.
Possible alternatives that would achieve better load balancing are: a) increase the number of tunnels or b) switch to mLDP.
Pre 4.2.0 releases, for the ipv6 hash calculation we only use the last 64 bits of the address to fold and feed that into the hash, this including the regular routerID and L4 info.
In 4.2.0 we made some further enhancements that the full IPv6 Addr is taken into consideration with L4 and router ID.
You can determine the load-balancing on the router by using the following commands
For IP :
RP/0/RSP0/CPU0:A9K-BNG#show cef exact-route 1.1.1.1 2.2.2.2 protocol udp ?
source-port Set source port
You have the ability to only specify L3 info, or include L4 info by protocol with source and destination ports.
It is important to understand that the 9k does FLOW based hashing, that is, all packets belonging to the same flow will take the same path.
If one flow is more active or requires more bandwidth then another flow, path utilization may not be a perfect equal spread.
UNLESS you provide enough variation in L3/L4 randomness, this problem can't be alleviated and is generally seen in lab tests due the limited number of flows.
For MPLS based hashing :
RP/0/RSP0/CPU0:A9K-BNG#sh mpls forwarding exact-route label 1234 bottom-label 16000 ... location 0/1/cpu0
This command gives us the output interface chosen as a result of hashing with mpls label 16000. The bottom-label (in this case '16000') is either the VC label (in case of PW L2 traffic) or the bottom label of mpls stack (in case of mpls encapped L3 traffic with more than 4 labels). Please note that for regular mpls packets (with <= 4 labels) encapsulating an L3 packet, only IP based hashing is performed on the underlying IP packet.
Also note that the mpls hash algorithm is different for trident and typhoon. The varied the label is the better is the distribution. However, in case of trident there is a known behavior of mpls hash on bundle interfaces. If a bundle interface has an even number of member links, the mpls hash would cause only half of these links to be utlized. To get around this, you may have to configure "cef load-balancing adjust 3" command on the router. Or use odd number of member links within the bundle interface. Note that this limitation applies only to trident line cards and not typhoon.
RP/0/RSP0/CPU0:A9K-BNG#bundle-hash bundle-e 100 loc 0/0/cPU0
Calculate Bundle-Hash for L2 or L3 or sub-int based: 2/3/4 [3]: 3
Enter traffic type (1.IPv4-inbound, 2.MPLS-inbound, 3:IPv6-inbound): [1]: 1
Single SA/DA pair or range: S/R [S]:
Enter source IPv4 address [255.255.255.255]:
Enter destination IPv4 address [255.255.255.255]:
Compute destination address set for all members? [y/n]: y
Enter subnet prefix for destination address set: [32]:
Enter bundle IPv4 address [255.255.255.255]:
Enter L4 protocol ID. (Enter 0 to skip L4 data) [0]:
Invalid protocol. L4 data skipped.
Link hashed [hash_val:1] to is GigabitEthernet0/0/0/19 LON 1 ifh 0x4000580
The hash type L2 or L3 depends on whether you are using the bundle Ethernet interface as an Attachment Circuit in a Bridgedomain or VPWS crossconnect, or whether the bundle ether is used to route over (eg has an IP address configured).
Polarization pertains mostly to ECMP scenarios and is the effect of routers in a chain making the same load-balancing decision.
The following picture tries to explain that.
In this scenario we assume 2 bucket, 1 bit on a 7 bit hash result. Let's say that in this case we only look at bit-0. So it becomes an "EVEN" or "ODD" type decision. The routers in the chain have access to the same L3 and L4 fields, the only varying factor between them is the routerID.
In the case that we have RID's that are similar or close (which is not uncommon), the system may not provide enough variation in the hash result which eventually leads to subsequent routers to compute the same hash and therefor polarize to a "Southern" (in this example above) or "Northern" path.
In XR 4.2.1 via a SMU or in XR 4.2.3 in the baseline code, we provide a knob that allows for shifting the hash result. By choosing a different "shift" value per node, we can make the system look at a different bit (for this example), or bits.
In this example the first line shifts the hash by 1, the second one shifts it by 2.
Considering that we have more buckets in the real implementation and more bits that we look at, the member or path selection can alter significantly based on the same hash but with the shifting, which is what we ultimately want.
Command
cef load-balancing algorithm adjust <value>
The command allows for values larger then 4 on Trident, if you configure values large then 4 for Trident, you will effectively use a modulo, resulting in the fact that shift of 1 is the same as a shift of 5
When the system detects fragmented packets, it will no longer use L4 information. The reason for that is that if L4 info were to be used, and subsequent fragments don't contain the L4 info anymore (have L3 header only!) the initial fragment and subsequent fragments produce a different hash result and potentially can take different paths resulting in out of order.
Regardless of release, regardless of hardware (ASR9K or CRS), when fragmentation is detected we only use L3 information for the hash computation.
- Starting release 6.4.2, when an layer 2 interface (EFP) receives mpls encapped ip packets, the hashing algorithm if configured for src-dest-ip will pick up ip from ingress packet to create a hash. Before 6.4.2 the Hash would be based on MAC.
- Starting XR 6.5, layer 2 interfaces receiving GTP encapsulated packets will automatically pick up the TEID to generate a hash when src-dest-ip is configured.
Xander Thuijs, CCIE #6775
Sr Tech Lead ASR9000
hi fernando,
ypu no problem doing that at all.
the pim session, ldp session and pim may take either one member, but the actual traffic, mpls encapped or mcast will be balanced according to the above algorithms
xander
Xander, can you shed some light on the following?
I have the following connections, with a L2VC between PEs in order to pass IP over Ethernet traffic from CORE1 to CORE2.
CORE1 <=> PE (ASR9k) <=ECMP=> PE (7600) <=> CORE2
So it's like the following in terms of headers:
EtherHeader-Mpls(next hop label)-Mpls(pseudowire label)-etherheader-InnerIP
The l2vpn config on the ASR9k is the following:
l2vpn
load-balancing flow src-dst-ip
!
bridge group CORE
bridge-domain CORE
interface TenGigE0/1/0/0.2816 <= AC
!
neighbor 10.201.201.9 pw-id 2816100002 <= PW
!
!
RP/0/RSP0/CPU0:ASR9k#sh l2vpn bridge-domain bd-name CORE det | i "Bala|PW:"
Load Balance Hashing: src-dst-ip
PW: neighbor 10.201.201.9, PW ID 2816100002, state is up ( established )
Load Balance Hashing: src-dst-ip
RP/0/RSP0/CPU0:ASR9k#sh l2vpn bridge-domain bd-name CORE det | b "List of Access PWs:"
List of Access PWs:
PW: neighbor 10.201.201.9, PW ID 2816100002, state is up ( established )
PW class not set, XC ID 0xc000002b
Encapsulation MPLS, protocol LDP
Source address 10.201.201.240
PW type Ethernet, control word disabled, interworking none
PW backup disable delay 0 sec
Sequencing not set
Load Balance Hashing: src-dst-ip
PW Status TLV in use
MPLS Local Remote
------------ ------------------------------ ---------------------------
Label 16042 206
Group ID 0x25 0x0
Interface Access PW ** CORE **
MTU 9200 9200
Control word disabled disabled
PW type Ethernet Ethernet
VCCV CV type 0x2 0x12
(LSP ping verification) (LSP ping verification)
VCCV CC type 0x6 0x6
(router alert label) (router alert label)
(TTL expiry) (TTL expiry)
------------ ------------------------------ ---------------------------
RP/0/RSP0/CPU0:ASR9k#sh cef 10.201.201.9
10.201.201.9/32, version 735, internal 0x4004001 (ptr 0xadab59b0) [1], 0x0 (0xad01834c), 0x440 (0xae47e050)
Updated Oct 3 02:54:59.320
remote adjacency to TenGigE0/1/0/3
Prefix Len 32, traffic index 0, precedence routine (0), priority 1
via 10.201.10.98, TenGigE0/1/0/3, 12 dependencies, weight 0, class 0 [flags 0x0]
path-idx 0 [0xae1f2504 0xae3e8110]
next hop 10.201.10.98
remote adjacency
local label 16060 labels imposed {ImplNull}
via 10.201.10.250, TenGigE0/2/0/2, 12 dependencies, weight 0, class 0 [flags 0x0]
path-idx 1 [0xae1f30e0 0xae3e816c]
next hop 10.201.10.250
remote adjacency
local label 16060 labels imposed {ImplNull}
RP/0/RSP0/CPU0:ASR9k#sh mpls forwarding prefix 10.201.201.9/32
Local Outgoing Prefix Outgoing Next Hop Bytes
Label Label or ID Interface Switched
------ ----------- ------------------ ------------ --------------- ------------
16060 Pop 10.201.201.9/32 Te0/1/0/3 10.201.10.98 18255747189680
Pop 10.201.201.9/32 Te0/2/0/2 10.201.10.250 1072700375625
I can see the traffic being load-balanced in the ASR9k => 7600 direction, but i cannot find the reason based on the above doc.
It's like there is load-balancing happening based on InnerIP, but that's not supposed to work if i understand correctly the above doc.
Thanks,
Tassos
hi tassos, the router makes LB decissions on the ingress LC. on the PE you still have access to the IP fields on the AC, the hash is derived there and when an LB decission is to be made that pre-computed hash is used.
Also cisco live presentation 2904 with some good detail on loadbalancing and some more use cases.
cheers
xander
Hi
Any idea on how to troubleshoot FAT PW float label ?
I have a chain of 5 ASR9k all of them linked with Bundle interfaces each of 2xTenG.
PE(ASR9k)<=B=> P (ASR9k) <=B=> P (MX240) <=B=>P (ASR9k)<=B=>PE(ASR9k)
I have setup a FAT PW end to end loaded with mix of 200 soruce and destination IPs and one of the P routers is not balancing in any direction ?
All the rest are balancing as expected 2 PEs and 2Ps. XR version 4.3.2 Mix of Trident and Thyphoon cards and RSP8G, RSP440TR and RSP440SE.
I noticed that bundle-hash dosn't have the option for float label ?
BR
Bozhidar
Hi Bozhidar,
which device in this chain is not loadbalancing as expected?
remember that the PE that is imposing the fat label will NOT use it for its loadbalancing decission. So I think that PE-left
if traffic is left to right, may be the one that is not balancing it correctly.
this is because the PE imposition path computes the hash BEFORE the fat label is inserted.
That cisco live preso 2904 referenced has some more detail in the LB section that discusses this in a bit more detail
regards
xander
Thank you for the quick reply-
Actually the first P from left to right. And what is more strange is that the first PE (left) is balancing and the Ps after the first one (left to right) are balancing as well so the float lable must be there...
This is for the traffic flow left to right... for the oposite direction i have the same situation all devices along the path are balancing just this P again is not balancing when transmiting to the last PE.
PE(ASR9k)<=B=> P (NOT BALANCING in any direction) <=B=> P (MX240) <=B=>P (ASR9k)<=B=>PE(ASR9k)
that is interesting and I cant explain that!
can you let us know what the version is running on that device and the installed smu's also send me the (bundle) interface config to left and right and the mpls config.
I also need the cef outputs for the next hops out of those bundle interfaces to the connected devices, because maybe something is not right there.
depending on that we may have a config issue or a bug. at which we might need a TAC case to continue down the bug path, but lets check those outputs first.
regards
xander
OK let's see what i have -
RP/0/RSP0/CPU0:ASR9K_P1-2#show install active
Fri Oct 4 19:16:18.667 EEST
Secure Domain Router: Owner
Node 0/RSP0/CPU0 [RP] [SDR: Owner]
Boot Device: disk0:
Boot Image: /disk0/asr9k-os-mbi-4.3.2/0x100305/mbiasr9k-rsp3.vm
Active Packages:
disk0:asr9k-doc-px-4.3.2
disk0:asr9k-fpd-px-4.3.2
disk0:asr9k-k9sec-px-4.3.2
disk0:asr9k-mcast-px-4.3.2
disk0:asr9k-mgbl-px-4.3.2
disk0:asr9k-mini-px-4.3.2
disk0:asr9k-mpls-px-4.3.2
disk0:asr9k-optic-px-4.3.2
disk0:asr9k-services-px-4.3.2
Node 0/0/CPU0 [LC] [SDR: Owner]
Boot Device: mem:
Boot Image: /disk0/asr9k-os-mbi-4.3.2/lc/mbiasr9k-lc.vm
Active Packages:
disk0:asr9k-mcast-px-4.3.2
disk0:asr9k-mini-px-4.3.2
disk0:asr9k-mpls-px-4.3.2
disk0:asr9k-optic-px-4.3.2
disk0:asr9k-services-px-4.3.2
RP/0/RSP0/CPU0:ASR9K_P1-2#show platform
Fri Oct 4 19:16:35.682 EEST
Node Type State Config State
-----------------------------------------------------------------------------
0/RSP0/CPU0 A9K-RSP440-SE(Active) IOS XR RUN PWR,NSHUT,MON
0/0/CPU0 A9K-8T-L IOS XR RUN PWR,NSHUT,MON
RP/0/RSP0/CPU0:ASR9K_P1-2#
interface Bundle-Ether1
description Bundle to Right
mtu 9192
ipv4 address 10.30.0.5 255.255.255.252
load-interval 30
!
RP/0/RSP0/CPU0:ASR9K_P1-2#sh run int bundle-ether 2
Fri Oct 4 19:16:55.716 EEST
interface Bundle-Ether2
description Bundle to Left
mtu 9192
ipv4 address 10.30.0.1 255.255.255.252
load-interval 30
Fri Oct 4 19:17:26.309 EEST
mpls ldp
router-id 10.11.0.2
nsr
graceful-restart
graceful-restart reconnect-timeout 60
graceful-restart forwarding-state-holdtime 180
session protection
neighbor password encrypted 05061603320142081B
igp sync delay 10
log
neighbor
session-protection
nsr
!
mldp
!
interface Bundle-Ether1
!
interface Bundle-Ether2
!
interface TenGigE0/0/0/5
!
!
mpls oam
!
RP/0/RSP0/CPU0:ASR9K_P1-2#show mpls forwarding
Fri Oct 4 19:17:51.201 EEST
Local Outgoing Prefix Outgoing Next Hop Bytes
Label Label or ID Interface Switched
------ ----------- ------------------ ------------ --------------- ------------
16000 16017 MLDP LSM ID: 0x1 BE2 10.30.0.2 976675330
300304 MLDP LSM ID: 0x1 BE1 10.30.0.6 360907262
16001 16006 10.11.0.1/32 BE2 10.30.0.2 0
16002 Pop 10.21.0.4/32 BE2 10.30.0.2 404236741030
16003 16022 10.21.0.10/32 BE2 10.30.0.2 1181761
16004 Pop 10.30.0.8/30 BE2 10.30.0.2 0
16005 Pop 10.30.0.100/31 BE2 10.30.0.2 0
16006 Pop 40.0.0.0/22 BE2 10.30.0.2 0
16007 300240 10.11.0.3/32 BE1 10.30.0.6 0
16008 Pop 10.21.0.2/32 BE1 10.30.0.6 541225
16009 300288 10.21.0.3/32 BE1 10.30.0.6 854473
16010 300256 10.21.0.5/32 BE1 10.30.0.6 277329194115
16011 300272 100.2.0.0/24 BE1 10.30.0.6 0
16012 Unlabelled 10.30.0.16/30 BE1 10.30.0.6 0
16013 300240 10.30.0.24/30 BE1 10.30.0.6 0
16014 300240 10.30.0.200/31 BE1 10.30.0.6 0
16015 16019 MLDP LSM ID: 0x2 BE2 10.30.0.2 16434
16016 300320 MLDP LSM ID: 0x3 BE1 10.30.0.6 3168541638
16017 Aggregate ZTE: Per-VRF Aggr[V] \
ZTE 510300
RP/0/RSP0/CPU0:ASR9K_P1-2#show cef 10.30.0.2
Fri Oct 4 19:18:09.701 EEST
10.30.0.0/30, version 5, attached, connected, glean adjacency, internal 0xc0000c1 (ptr 0x71bf03c0) [1], 0x0 (0x7140c690), 0x0 (0x0)
Updated Oct 4 14:03:34.289
Prefix Len 30, traffic index 0, precedence n/a, priority 0
via Bundle-Ether2, 2 dependencies, weight 0, class 0 [flags 0x8]
path-idx 0 [0x70f143ec 0x0]
glean adjacency
RP/0/RSP0/CPU0:ASR9K_P1-2#show cef 10.30.0.6
Fri Oct 4 19:18:11.344 EEST
10.30.0.4/30, version 7, attached, connected, glean adjacency, internal 0xc0000c1 (ptr 0x71bf0570) [1], 0x0 (0x7140c730), 0x0 (0x0)
Updated Oct 4 14:03:35.593
Prefix Len 30, traffic index 0, precedence n/a, priority 0
via Bundle-Ether1, 2 dependencies, weight 0, class 0 [flags 0x8]
path-idx 0 [0x70f14440 0x0]
glean adjacency
RP/0/RSP0/CPU0:ASR9K_P1-2#
Quite and output. Everything looks ok to me? But see the traffic -
Bundle2 left
RP/0/RSP0/CPU0:ASR9K_P1-2#sh int tenGigE 0/0/0/0 | i rate
Fri Oct 4 19:18:51.712 EEST
30 second input rate 260301000 bits/sec, 37360 packets/sec
30 second output rate 508151000 bits/sec, 73532 packets/sec
RP/0/RSP0/CPU0:ASR9K_P1-2#sh int tenGigE 0/0/0/1 | i rate
Fri Oct 4 19:18:54.396 EEST
30 second input rate 252685000 bits/sec, 36400 packets/sec
30 second output rate 1000 bits/sec, 1 packets/sec
Bundle1 right
RP/0/RSP0/CPU0:ASR9K_P1-2#sh int tenGigE 0/0/0/3 | i rate
Fri Oct 4 19:19:03.392 EEST
30 second input rate 247453000 bits/sec, 35663 packets/sec
30 second output rate 512793000 bits/sec, 73745 packets/sec
RP/0/RSP0/CPU0:ASR9K_P1-2#sh int tenGigE 0/0/0/4 | i rate
Fri Oct 4 19:19:05.811 EEST
30 second input rate 263007000 bits/sec, 37870 packets/sec
30 second output rate 1000 bits/sec, 2 packets/sec
RP/0/RSP0/CPU0:ASR9K_P1-2#
thanks for that detail, I am thinking something, do you happen to have a mac address that starts with a 4 or 6 by any chance. You may want to try and add the control word to the PW to make sure we are looking at the fat label instead of the (perceived) ip info in the payload.
another thing is also, is this the only device you have with a trident bundle? or do the other devices have a trident card also,
most interested in the hw config of the asr9k-p on the right.
regards
xander
I was thinking about this but no, my IXIA is pushing only mac's starting with 23.23.x.x.x.x.
I will try the control word trick and let you know any how. Now when i double the check the other one with trident bundle is the far right PE with the same card but different RSP8G. the ASR on the right of the problematic P is actually J MX240
Because it's working in all other nodes i was sure that i am missing some configuration on this problematic P but actually there is nothing special to configure this is why I am so frustrated.
Confirmed CW didn't change behaviour
yeah if the mac doesnt start with 4 or 6, then the CW wont help, there is obviously an incorrect balancing happening on your PE left. The RP version or type should have no bearing on it as it is the hw that is computing the hash and that is the same for both PE's.
Also it is the outbound LB that is incorrect so the problem is local, if it was inbound, then I could have deflected it to the J .
Although the RP has no direct bearing on the hw forwarding, it could be a programming issue, but that sounds odd also.
Can you do this for me please:
RP/0/RSP0/CPU0:A9K-BNG#bundle-hash bundle-e100 loc 0/0/cPU0
Calculate Bundle-Hash for L2 or L3 or sub-int based: 2/3/4 [3]:
Enter traffic type (1.IPv4-inbound, 2.MPLS-inbound, 3:IPv6-inbound): [1]: 2
Number of ingress MPLS labels is 4 or less: y/n [y]: y
Enter MPLS payload type (1.IPv4, 2:IPv6, 3:other): [1]: 3
Enter the bottom label in decimal (20-bit value) :2
Link hashed [hash:199] to is GigabitEthernet0/0/0/19 ICL () LON 1 ifh 0x4000700
Another? [y]:
Enter the bottom label in decimal (20-bit value) :3
Link hashed [hash:200] to is GigabitEthernet0/0/0/9 ICL () LON 0 ifh 0x4000480
this command is broken in some ways currently (that is the actual member displayed is not the actual member chosen) but it should give us an impression whether it *can* balance on label or not.
With all this detail then captured, I would want to recommend filing a TAC case as this needs to be fixed up.
If you happen to have a typhoon card spare it would be great if you can swap that out and see if that makes a difference as well as the RSP type, but then I am asking you a lot I realize, but if it is easy to do, it would be great additional detail that we can use to narrow down the precise issue and complete it for ddts filing.
regards
xander
Hey,
10x I hope I got you right -
RP/0/RSP0/CPU0:ASR9K_P1-2#bundle-hash bundle-ether 1 location 0/0/CPU0
Fri Oct 4 21:01:35.116 EEST
Calculate Bundle-Hash for L2 or L3 or sub-int based: 2/3/4 [3]: 3
Enter traffic type (1.IPv4-inbound, 2.MPLS-inbound, 3:IPv6-inbound): [1]: 2
Number of ingress MPLS labels is 4 or less: y/n [y]: y
Enter MPLS payload type (1.IPv4, 2:IPv6, 3:other): [1]: 3
Enter the bottom label in decimal (20-bit value) :2
Link hashed [hash:57] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :3
Link hashed [hash:59] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
yup you understood the concept of what I was after, but do this for a variety of numbers and see if the control plane also picks the other member
xander
Yep I tried it many times i always see one interface -
RP/0/RSP0/CPU0:ASR9K_P1-2#bundle-hash bundle-ether 1 location 0/0/CPU0 DIRECTION RIGHT
Sat Oct 5 15:10:30.427 EEST
Calculate Bundle-Hash for L2 or L3 or sub-int based: 2/3/4 [3]:
Enter traffic type (1.IPv4-inbound, 2.MPLS-inbound, 3:IPv6-inbound): [1]: 2
Number of ingress MPLS labels is 4 or less: y/n [y]:
Enter MPLS payload type (1.IPv4, 2:IPv6, 3:other): [1]: 3
Enter the bottom label in decimal (20-bit value) :5
Link hashed [hash:63] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :6
Link hashed [hash:65] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :7
Link hashed [hash:67] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :88
Link hashed [hash:229] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :34
Link hashed [hash:121] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :2345
Link hashed [hash:135] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :2345
Link hashed [hash:135] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]:
Enter the bottom label in decimal (20-bit value) :3242356
Invalid label. Label value range is 0-1048575
Another? [y]:
Enter the bottom label in decimal (20-bit value) :4324
Link hashed [hash:253] to is TenGigE0/0/0/4 ICL () LON 1 ifh 0x240
Another? [y]: 345
What is even more strange is that actually the traffic goes via the other interface of the bundle 0/0/0/3
RP/0/RSP0/CPU0:ASR9K_P1-2#sh int tenGigE 0/0/0/3 | i rate
Sat Oct 5 15:15:56.810 EEST
30 second input rate 1000 bits/sec, 1 packets/sec
30 second output rate 515656000 bits/sec, 73763 packets/sec
RP/0/RSP0/CPU0:ASR9K_P1-2#sh int tenGigE 0/0/0/4 | i rate
Sat Oct 5 15:16:00.493 EEST
30 second input rate 2000 bits/sec, 3 packets/sec
30 second output rate 1000 bits/sec, 2 packets/sec
Same for the other direction left to the PE. Traffic goes out of 0/0/0/0 but I always get from the script 0/0/0/1
What is the best way to proceed is this enough information to open a TAC case and fill in a Deffect ?
Out of this subject but in this topic I wanted to ask you if you have more information on the logic behind the hash algorithm used in calculation of the flow label and then the function used by the P routers to make switching decision on the bundle ? I am asking this question because I have played a lot with the IXIA and i noticed that the fewer the flows i have (SRC and DST couples) the less even the traffic is devidide between the ports. Mainly I am intersted if somehow the flows size (bandwith) is included in the calculation somehow or not ? My gues here is that it is not but ...
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: