08-29-2014 03:00 AM - edited 03-07-2019 08:34 PM
Hi, is anyone aware of any quirks or special configurations required when connecting Intel 10GbE NICs to Nexus switches? We've run into a number of problems. Have I missed anything obvious? Details below.
Setup no. 1: Nexus 4900M, two Dell R510 with dual-port Intel 82599EB (8086:151c), two Dell R720xd with 2x Intel X540-AT2 (8086:1528). The servers are running CentOS6.4 and the 10GbE interfaces are bonded in mode 4 (802.3ad/LACP).
Problem no. 1: The R720xd boxes occasionally lose network connection. One of the interfaces in the bond goes down for 4 seconds, then recovers. This happens at random. Sometimes, the other interface in the bond goes down as well, before the first one has recovered, and that's when the machine loses network connection. The R510 boxes, however, do not exhibit this behaviour.
Setup no. 2: The Nexus 4900M has been replaced with a Nexus 3064T switch.
Problem no.2: The R510s and R720xds have swapped roles. About a week after after the switch was replaced, the R510 NICs started flapping at an exorbitant rate, going down several times every minute.
Workaround: I have unretired the 4900 and moved the R510 machines to it. Since then, no interface flapping has been observed. I.e. the R510 are on the 4900, and the R720xd on the 3064.
Interface/teaming config on the 4900:
interface Port-channel1
switchport
switchport trunk allowed vlan 1,32
switchport mode trunk
!
interface TenGigabitEthernet2/3
switchport trunk allowed vlan 1,32
switchport mode trunk
channel-group 1 mode active
spanning-tree bpduguard enable
!
interface TenGigabitEthernet3/5
switchport trunk allowed vlan 1,32
switchport mode trunk
channel-group 1 mode active
spanning-tree bpduguard enable
!
Same for the 3064:
interface port-channel1
switchport mode trunk
switchport trunk allowed vlan 1,32
no negotiate auto
!
interface Ethernet1/1
switchport mode trunk
switchport trunk allowed vlan 1,32
spanning-tree port type edge
spanning-tree bpduguard enable
channel-group 1 mode active
!
interface Ethernet1/17
switchport mode trunk
switchport trunk allowed vlan 1,32
spanning-tree port type edge
spanning-tree bpduguard enable
channel-group 1 mode active
!
09-02-2014 07:35 AM
Hi Lars,
Did you updated the firmware of all your hardware (NIC/BIOS) components?
09-02-2014 07:49 AM
Yes, all Dell machines had their firmwares and BIOS updated to the latest available through OMSA. I also applied Intel's preboot updates, admittedly not the latest (v19 vs. 19.3).
I have now hard evidence that the problem has nothing to do with teaming as such. We have other machines here with the Intel 82599EB card (Proliant 360p g8), with only a single port in use, and they have exactly the same problem on the 3064. Yet another datapoint, another group of (custom-built) servers that have Intel X540-AT2 NICs do not show the problem. This exactly mirrors the behaviour of the Dells in the original post.
09-02-2014 09:30 AM
Hi Lars,
Please check this out:
https://supportforums.cisco.com/discussion/12291366/intel-corporation-82599eb-10-gigabit-receive-missed-errors
09-02-2014 02:45 PM
I don't think this applies. We observe this problem across driver versions, ranging from 3.9.15-k (latest CentOS6.4) over 3.17.3 to 3.18.7. The only correlation I have right now is the combination of NIC and switch.
82599EB + 4900 is good.
X540-AT2 + 3064 is good.
Other combinations are not.
Thanks.
09-03-2014 04:47 AM
I'm currently thinking along the lines of "interesting"/buggy driver and behavioural differences between the two switch models. Does anyone know if the setting for RX/TX flow control on the switches must or should match the NIC settings? I found that the 3064 defaults to RX/TX flow control off for all interfaces, whereas the 4900 enables it for connected ports with link up (all NICs have it enabled, but I don't know whether that's how the 4900 decides to enable). Nothing related to flow control has been explicitly configured.
09-09-2014 04:00 AM
While I still don't understand the cause of this problem, and still don't understand the specifics of flow control negotiation, some experimentation shows that I can work around the problem by either configuring all ports on the 3064 that link to 82599EB cards with
flowcontrol receive on
flowcontrol send on
or, configure those interfaces on the server to turn the pause options off (autoneg/rx/tx) with ethtool.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide