cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1169
Views
1
Helpful
17
Replies

Nexus VPc load balancing, polarization

We have an occurring issue across the board with our N9k's.  Each site has a pair of N9k's connected redundantly to many catalyst switches, and all traffic going out to these switches only use one link, and the N9k's do not load balance the traffic going out to the connected switches.  Fail over works as expected.

For example and for the easiest setup, we have a pair of N9k's that have redundant connection to a customer's Cisco router with the ports configured as access ports, the customer is a service provider.  This provider offers free Wi-Fi to many customers.  All traffic leaving our N9k's prefer switch 1, and does not load balance at all.  We are contracted to provide a redundant 2Gbps (1Gbps + 1Gbps) service.  We are currently maxing out one of the 1Gbps during peak times, resulting in drop packets.  Again, failover works as expected.  The customer router does not have 10Gbps ports.

We were on nxos.7.0.3.I7.10.1, we tried upgrading to nxos.9.3.9, and that did not change anything.

Both switches have this config:

interface port-channel60
description LAG-60: Trunk to XXXXX 2Gbps service
switchport access vlan 299
vpc 60

interface Ethernet1/30
description LAG-60 XXXXX 2Gbps Circuit
switchport access vlan 299
spanning-tree port type edge
channel-group 60 mode active

 

switch 1:

show port-channel load-balance
System config:
Non-IP: src-dst mac
IP: src-dst mac rotate 0
Port Channel Load-Balancing Configuration for all modules:
Module 1:
Non-IP: src-dst mac
IP: src-dst mac rotate 0

sh port-channel traffic int port 60
NOTE: Clear the port-channel member counters to get accurate statistics

ChanId Port Rx-Ucst Tx-Ucst Rx-Mcst Tx-Mcst Rx-Bcst Tx-Bcst
------ --------- ------- ------- ------- ------- ------- -------
60 Eth1/30 92.06% 100.00% 89.53% 93.95% 0.0% 0.0%

 

sh int e1/30
Ethernet1/30 is up
admin state is up, Dedicated Interface
Belongs to Po60
Hardware: 1000/10000 Ethernet, address: cc46.d6b3.9af1 (bia cc46.d6b3.9af1)
Description: LAG-60 MCCS 2Gbps Circuit
MTU 1500 bytes, BW 1000000 Kbit , DLY 10 usec
reliability 255/255, txload 138/255, rxload 7/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 1000 Mb/s, media type is 1G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
admin fec state is auto, oper fec state is off
Last link flapped 00:52:22
Last clearing of "show interface" counters 00:42:48
0 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 28293864 bits/sec, 9282 packets/sec
30 seconds output rate 542656384 bits/sec, 52958 packets/sec
input rate 28.29 Mbps, 9.28 Kpps; output rate 542.66 Mbps, 52.96 Kpps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 29151240 bits/sec, 8958 packets/sec
300 seconds output rate 505885544 bits/sec, 50074 packets/sec
input rate 29.15 Mbps, 8.96 Kpps; output rate 505.89 Mbps, 50.07 Kpps
RX
24968459 unicast packets 92 multicast packets 0 broadcast packets
24968551 input packets 9663539018 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
130388484 unicast packets 346 multicast packets 0 broadcast packets
130388830 output packets 165826559252 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 3254413 output discard
0 Tx pause

switch 2:

sh port-channel load-balance
System config:
Non-IP: src-dst mac
IP: src-dst mac rotate 0
Port Channel Load-Balancing Configuration for all modules:
Module 1:
Non-IP: src-dst mac
IP: src-dst mac rotate 0

sh port-channel traffic int port 60
NOTE: Clear the port-channel member counters to get accurate statistics

ChanId Port Rx-Ucst Tx-Ucst Rx-Mcst Tx-Mcst Rx-Bcst Tx-Bcst
------ --------- ------- ------- ------- ------- ------- -------
60 Eth1/30 76.56% 0.0% 65.21% 85.14% 0.0% 0.0%

sh int e1/30
Ethernet1/30 is up
admin state is up, Dedicated Interface
Belongs to Po60
Hardware: 1000/10000 Ethernet, address: cc46.d6b3.9e55 (bia cc46.d6b3.9e55)
Description: LAG-60 MCCS 2Gbps Circuit
MTU 1500 bytes, BW 1000000 Kbit , DLY 10 usec
reliability 255/255, txload 1/255, rxload 9/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 1000 Mb/s, media type is 1G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
admin fec state is auto, oper fec state is off
Last link flapped 00:38:29
Last clearing of "show interface" counters 00:51:43
1 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 36408568 bits/sec, 9263 packets/sec
30 seconds output rate 456 bits/sec, 0 packets/sec
input rate 36.41 Mbps, 9.26 Kpps; output rate 456 bps, 0 pps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 45266168 bits/sec, 10345 packets/sec
300 seconds output rate 248 bits/sec, 0 packets/sec
input rate 45.27 Mbps, 10.35 Kpps; output rate 248 bps, 0 pps
RX
23668767 unicast packets 101 multicast packets 0 broadcast packets
23668868 input packets 8934126866 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
0 unicast packets 1560 multicast packets 0 broadcast packets
1560 output packets 142946 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 0 output discard
0 Tx pause

 

1 Accepted Solution

Accepted Solutions

Hello!

Based on this most recent topology, let's assign some names to specific devices to make the conversation easier.

  • Customer-Router (rightmost router connected to Customers)
  • Customer-N9K-1 (top switch in the diagram on the Customer side)
  • Customer-N9K-2 (bottom switch in the diagram on the Customer side)
  • Internet-N9K-1 (top switch in the diagram on the Internet side)
  • Internet-N9K-2 (bottom switch in the diagram on the Internet side)
  • Internet-Router (leftmost router connected to Internet)

Let's also assume that your traffic is flowing from your customers to the Internet (although you would likely see this same issue in the reverse direction, and I believe you've identified that's the case too).

When traffic from your customers enters Customer-Router, it will route the packet and choose the Port-channel60 interface (which has two members - Gi0/0/2 and Gi0/0/3) as the egress interface. It will subsequently hash the packet out of one of the Port-channel60 members (either Gi0/0/2 or Gi0/0/3).

Let's say Customer-Router chooses to route this packet out of Gi0/0/2, and let's also say Gi0/0/2 connects to Customer-N9K-1. When Customer-N9K-1 switches this packet/frame according to the packet's destination MAC address, it will choose vPC Po60 as the egress interface. Even though Po60 is a vPC, Customer-N9K-1 will choose to forward this frame out of interface Ethernet1/30; it will not attempt to load balance traffic across the vPC Peer-Link to Customer-N9K-2's vPC Po60.

This is a key point - when two Nexus switches are in a vPC domain, one vPC peer is not cognizant of what or how much data plane traffic the other vPC peer is forwarding. The other vPC peer could be forwarding several terabits of traffic, or none at all. Therefore, the two vPC peers will not forward data plane traffic across the vPC Peer-Link in an attempt to load balance data plane traffic between the two peers.

Therefore, the only way to solve this polarization issue is to focus on the routers sending traffic towards the Nexus switches. It is the duty of Customer-Router to balance traffic it forwards towards Customer-N9K-1 and Customer-N9K-2. We will need to investigate its hashing algorithm (as well as the profile of traffic it is trying to send towards the Internet) in order to resolve this issue.

I hope this helps explain this behavior in a bit more detail - thank you!

-Christopher

View solution in original post

17 Replies 17

balaji.bandi
Hall of Fame
Hall of Fame

This is on the nexus side, how about other side device configured ?

that is suggested method always default

if you like to try different method you can check below :

You can configure the device to use one of the following methods to load-balance across the port channel:

  • Destination MAC address

  • Source MAC address

  • Source and destination MAC address

  • Destination IP address

  • Source IP address

  • Source and destination IP address

  • Source TCP/UDP port number

  • Destination TCP/UDP port number

  • Source and destination TCP/UDP port number

  • GRE inner IP headers with source, destination, and source-destination

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Share 

Show ether channel summary 

Or 

Show port channel summary 

In both NSK

Show vpc brief 

Show spanning tree

MHM

Here is the router side:

interface Port-channel60
ip address x.x.x.x m.m.m.m
negotiation auto

interface GigabitEthernet0/0/2
no ip address
no ip redirects
no ip unreachables
no ip proxy-arp
negotiation auto
channel-group 60 mode active

interface GigabitEthernet0/0/3
no ip address
no ip redirects
no ip unreachables
no ip proxy-arp
negotiation auto
channel-group 60 mode active

 

Currently the N9K's are set to - Source and destination MAC address

Switch 1:

show port-channel summary
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
b - BFD Session Wait
S - Switched R - Routed
U - Up (port-channel)
p - Up in delay-lacp mode (member)
M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------

10 Po10(SU) Eth LACP Eth1/53(P) Eth1/54(P)
60 Po60(SU) Eth LACP Eth1/29(D) Eth1/30(P)

 

show vpc brief
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 10
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : success
vPC role : primary, operational secondary
Number of vPCs configured : 14
Peer Gateway : Enabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Enabled, timer is off.(timeout = 240s)
Delay-restore status : Timer is off.(timeout = 30s)
Delay-restore SVI status : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router : Disabled
Virtual-peerlink mode : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id Port Status Active vlans
-- ---- ------ -------------------------------------------------
1 Po10 up 3-5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,39
,41,45,47,49,51,65-74,76,200-203,205-207,254,
299-322,360,400-401,500,610,700,800-801

vPC status
----------------------------------------------------------------------------
Id Port Status Consistency Reason Active vlans
-- ------------ ------ ----------- ------ ---------------

60 Po60 up success success 299

 

 


show spanning-tree vlan 299


VLAN0299
Spanning tree enabled protocol rstp
Root ID Priority 4395
Address 0023.04ee.be01
Cost 2
Port 4105 (port-channel10)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 33067 (priority 32768 sys-id-ext 299)
Address 0023.04ee.be0a
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Po10 Root FWD 1 128.4105 (vPC peer-link) Network P2p
Po20 Desg FWD 1 128.4115 (vPC) P2p
Po45 Root FWD 1 128.4140 (vPC) P2p
Po60 Desg FWD 1 128.4155 (vPC) P2p
Eth1/3 Desg FWD 4 128.3 P2p
Eth1/21 Desg FWD 2 128.21 P2p
Eth1/43 Desg FWD 2 128.43 P2p
Eth1/46 Desg FWD 2 128.46 P2p


**********Switch 2:

show port-channel summary
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
b - BFD Session Wait
S - Switched R - Routed
U - Up (port-channel)
p - Up in delay-lacp mode (member)
M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------
10 Po10(SU) Eth LACP Eth1/53(P) Eth1/54(P)
60 Po60(SU) Eth LACP Eth1/29(D) Eth1/30(P)

 

show vpc brief
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 10
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : success
vPC role : secondary, operational primary
Number of vPCs configured : 14
Peer Gateway : Enabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Enabled, timer is off.(timeout = 240s)
Delay-restore status : Timer is off.(timeout = 30s)
Delay-restore SVI status : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router : Disabled
Virtual-peerlink mode : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id Port Status Active vlans
-- ---- ------ -------------------------------------------------
1 Po10 up 3-5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,39
,41,45,47,49,51,65-74,76,200-203,205-207,254,
299-322,360,400-401,500,610,700,800-801

vPC status
----------------------------------------------------------------------------
Id Port Status Consistency Reason Active vlans
-- ------------ ------ ----------- ------ ---------------

60 Po60 up success success 299

 

show spanning-tree vlan 299

VLAN0299
Spanning tree enabled protocol rstp
Root ID Priority 4395
Address 0023.04ee.be01
Cost 1
Port 4140 (port-channel45)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Bridge ID Priority 33067 (priority 32768 sys-id-ext 299)
Address 0023.04ee.be0a
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec

Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Po10 Desg FWD 1 128.4105 (vPC peer-link) Network P2p
Po20 Desg FWD 1 128.4115 (vPC) P2p
Po45 Root FWD 1 128.4140 (vPC) P2p
Po60 Desg FWD 1 128.4155 (vPC) P2p
Eth1/24 Desg FWD 2 128.24 P2p

 

60 Po60(SU) Eth LACP Eth1/29(D) Eth1/30(P) <<- this NSK-1 

60 Po60(SU) Eth LACP Eth1/29(D) Eth1/30(P) <<- this NSK-2

interface GigabitEthernet0/0/2
no ip address
no ip redirects
no ip unreachables
no ip proxy-arp
negotiation auto
channel-group 60 mode active

interface GigabitEthernet0/0/3
no ip address
no ip redirects
no ip unreachables
no ip proxy-arp
negotiation auto
channel-group 60 mode active

in router there is two interface use for PO connect to both NSK but as I see there are four port in NSK (both) where it must be two  ine for each NSK 

MHM

 

 

Correct, but this isn't the reason for this behavior.  We tried moving both ports from the router to a single N9k to see if would load balance on a single switch.  I can remove these ports from the config, it doesn't change anything.

 

We have dual N9k's in many locations, with a very very basic config, only basic switching.  We have dual N9k's with VPC ether-channel to other dual N9k's, dual N9k's to almost a 100 of Cat IOS switches, and this router, all with the same issue and always with outbound traffic.

 

I mention port because it can cabling issue ?
can you confirm that the router connect to each NSK vPC SW with one correct link ?

MHM

Thanks again for continued support.  I went ahead and took out E1/29 from the port channel on both N9Ks.  There is no IGP between us and the customer, we are a L2 transport for them to the Internet.  

Switch 1:

show port-channel summary
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
b - BFD Session Wait
S - Switched R - Routed
U - Up (port-channel)
p - Up in delay-lacp mode (member)
M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------
60 Po60(SU) Eth LACP Eth1/30(P)

show port-channel traffic int por 60
NOTE: Clear the port-channel member counters to get accurate statistics

ChanId Port Rx-Ucst Tx-Ucst Rx-Mcst Tx-Mcst Rx-Bcst Tx-Bcst
------ --------- ------- ------- ------- ------- ------- -------
60 Eth1/30 100.00% 100.00% 100.00% 100.00% 0.0% 0.0%

 

 

Switch 2:

show port-channel summary interface port 60
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
b - BFD Session Wait
S - Switched R - Routed
U - Up (port-channel)
p - Up in delay-lacp mode (member)
M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port- Type Protocol Member Ports
Channel
--------------------------------------------------------------------------------
60 Po60(SU) Eth LACP Eth1/30(P)

show port-channel traffic int por 60
NOTE: Clear the port-channel member counters to get accurate statistics

ChanId Port Rx-Ucst Tx-Ucst Rx-Mcst Tx-Mcst Rx-Bcst Tx-Bcst
------ --------- ------- ------- ------- ------- ------- -------
60 Eth1/30 100.00% 0.0% 100.00% 100.00% 0.0% 0.0%

I have also tried the following commands:

port-channel load-balance src-dst mac rotate 32
port-channel load-balance src-dst l4port
port-channel load-balance src-dst l4port rotate 32
port-channel load-balance src-dst ip symmetric

port-channel load-balance src l4port
port-channel load-balance src ip-l4port rotate 32
port-channel load-balance src mac rotate 32

These did not help out at all.  What is strange is there used to be a bug ID posted CSCvq26885, but it seems to be an internal ID that has been removed from the original posting.

 

OK, 
so there is PO config with specific VLAN in both NSK
and there is L3 PO config in router 
you config default route in both NSK toward L3 PO IP to access internet ?

MHM

sbutler77yahoocom_0-1721602027750.png

This is the basic setup.  We have the polarization issue even between the 4 N9k's, always in the outbound direction towards the customer router.

Hello!

Based on this most recent topology, let's assign some names to specific devices to make the conversation easier.

  • Customer-Router (rightmost router connected to Customers)
  • Customer-N9K-1 (top switch in the diagram on the Customer side)
  • Customer-N9K-2 (bottom switch in the diagram on the Customer side)
  • Internet-N9K-1 (top switch in the diagram on the Internet side)
  • Internet-N9K-2 (bottom switch in the diagram on the Internet side)
  • Internet-Router (leftmost router connected to Internet)

Let's also assume that your traffic is flowing from your customers to the Internet (although you would likely see this same issue in the reverse direction, and I believe you've identified that's the case too).

When traffic from your customers enters Customer-Router, it will route the packet and choose the Port-channel60 interface (which has two members - Gi0/0/2 and Gi0/0/3) as the egress interface. It will subsequently hash the packet out of one of the Port-channel60 members (either Gi0/0/2 or Gi0/0/3).

Let's say Customer-Router chooses to route this packet out of Gi0/0/2, and let's also say Gi0/0/2 connects to Customer-N9K-1. When Customer-N9K-1 switches this packet/frame according to the packet's destination MAC address, it will choose vPC Po60 as the egress interface. Even though Po60 is a vPC, Customer-N9K-1 will choose to forward this frame out of interface Ethernet1/30; it will not attempt to load balance traffic across the vPC Peer-Link to Customer-N9K-2's vPC Po60.

This is a key point - when two Nexus switches are in a vPC domain, one vPC peer is not cognizant of what or how much data plane traffic the other vPC peer is forwarding. The other vPC peer could be forwarding several terabits of traffic, or none at all. Therefore, the two vPC peers will not forward data plane traffic across the vPC Peer-Link in an attempt to load balance data plane traffic between the two peers.

Therefore, the only way to solve this polarization issue is to focus on the routers sending traffic towards the Nexus switches. It is the duty of Customer-Router to balance traffic it forwards towards Customer-N9K-1 and Customer-N9K-2. We will need to investigate its hashing algorithm (as well as the profile of traffic it is trying to send towards the Internet) in order to resolve this issue.

I hope this helps explain this behavior in a bit more detail - thank you!

-Christopher

you have 4 NSK 
2 NSK vPC pair 
are you use same vpc domain ?
the PO connect two NSK vPC pair can you check it status in all four NSK and it STP status FWD or BLK 
it can STP block one link and keep other or lacp not work and one link is P and other is S
if you can share output here 

thanks 

MHM

Mr. Hart,

Thanks for the explanation, that makes sense, and we will start looking more on the Internet devices, then share our findings.

MHM, I will follow up with your questions soon.

 

Review Cisco Networking for a $25 gift card