cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4158
Views
0
Helpful
6
Replies

Nexus switch issues with Intel 10GbE cards and bonding/teaming?

lars.hecking
Level 1
Level 1

Hi, is anyone aware of any quirks or special configurations required when connecting Intel 10GbE NICs to Nexus switches? We've run into a number of problems. Have I missed anything obvious? Details below.

Setup no. 1: Nexus 4900M, two Dell R510 with dual-port Intel 82599EB (8086:151c), two Dell R720xd with 2x Intel X540-AT2 (8086:1528). The servers are running CentOS6.4 and the 10GbE interfaces are bonded in mode 4 (802.3ad/LACP).

Problem no. 1: The R720xd boxes occasionally lose network connection. One of the interfaces in the bond goes down for 4 seconds, then recovers. This happens at random. Sometimes, the other interface in the bond goes down as well, before the first one has recovered, and that's when the machine loses network connection. The R510 boxes, however, do not exhibit this behaviour.

Setup no. 2: The Nexus 4900M has been replaced with a Nexus 3064T switch.

Problem no.2: The R510s and R720xds have swapped roles. About a week after after the switch was replaced, the R510 NICs started flapping at an exorbitant rate, going down several times every minute.

Workaround: I have unretired the 4900 and moved the R510 machines to it. Since then, no interface flapping has been observed. I.e. the R510 are on the 4900, and the R720xd on the 3064.

 

Interface/teaming config on the 4900:

interface Port-channel1

 switchport
 switchport trunk allowed vlan 1,32
 switchport mode trunk

 

!

interface TenGigabitEthernet2/3
 switchport trunk allowed vlan 1,32
 switchport mode trunk
 channel-group 1 mode active
 spanning-tree bpduguard enable
!

interface TenGigabitEthernet3/5
 switchport trunk allowed vlan 1,32
 switchport mode trunk
 channel-group 1 mode active
 spanning-tree bpduguard enable
!

Same for the 3064:

interface port-channel1
  switchport mode trunk
  switchport trunk allowed vlan 1,32
  no negotiate auto

!

interface Ethernet1/1
  switchport mode trunk
  switchport trunk allowed vlan 1,32
  spanning-tree port type edge
  spanning-tree bpduguard enable
  channel-group 1 mode active

!

interface Ethernet1/17
  switchport mode trunk
  switchport trunk allowed vlan 1,32
  spanning-tree port type edge
  spanning-tree bpduguard enable
  channel-group 1 mode active

!

 

 

 

 

 

 

6 Replies 6

richbarb
Cisco Employee
Cisco Employee

Hi Lars,

 

Did you updated the firmware of all your hardware (NIC/BIOS) components?

 

Yes, all Dell machines had their firmwares and BIOS updated to the latest available through OMSA. I also applied Intel's preboot updates, admittedly not the latest (v19 vs. 19.3).

I have now hard evidence that the problem has nothing to do with teaming as such. We have other machines here with the Intel 82599EB card (Proliant 360p g8), with only a single port in use, and they have exactly the same problem on the 3064. Yet another datapoint, another group of (custom-built) servers that have Intel X540-AT2 NICs do not show the problem. This exactly mirrors the behaviour of the Dells in the original post.

Hi Lars,

 

Please check this out:

https://supportforums.cisco.com/discussion/12291366/intel-corporation-82599eb-10-gigabit-receive-missed-errors

I don't think this applies. We observe this problem across driver versions, ranging from 3.9.15-k (latest CentOS6.4) over 3.17.3 to 3.18.7. The only correlation I have right now is the combination of NIC and switch.

82599EB + 4900 is good.

X540-AT2 + 3064 is good.

Other combinations are not.

Thanks.

 

I'm currently thinking along the lines of "interesting"/buggy driver and behavioural differences between the two switch models. Does anyone know if the setting for RX/TX flow control on the switches must or should match the NIC settings? I found that the 3064 defaults to RX/TX flow control off for all interfaces, whereas the 4900 enables it for connected ports with link up (all NICs have it enabled, but I don't know whether that's how the 4900 decides to enable). Nothing related to flow control has been explicitly configured.

 

While I still don't understand the cause of this problem, and still don't understand the specifics of flow control negotiation, some experimentation shows that I can work around the problem by either configuring all ports on the 3064 that link to 82599EB cards with 

  flowcontrol receive on
  flowcontrol send on

or, configure those interfaces on the server to turn the pause options off (autoneg/rx/tx) with ethtool.

 

Review Cisco Networking for a $25 gift card