04-16-2013 11:52 PM
Hello,
We've been experiencing a lot of performance issues last months and I'm still trying to figure out where the problem is.
We have a IBM NSeries 6040 (= Netapp) storage controler (2 controlers), connected to two Nexus 5548 UP with FibreChannel (2 fibres / controler, one on each Nexus 5548 => 4 Fibres).
Our ESXi 5.1 servers are connected with FCOE to the two Nexus 5548UP, by Qlogic 8152 cards. Paths are managed with ALUA (Netapp provides ALUA support), so it's round robin on the right paths.
I updated all the firmwares one the hosts (Novascale R460, DELL R710), Qlogic 8152.
The Netapp (IBM) support says "it's not the storage" ... so I'm investigating about the Nexus config.
I still have huge latencies , given by esxtop and alerts one the hosts (latency increased to ...) .. this latency sometime reaches 300 ms !!!! which, by vmware is unacceptable.
I'm I missing something in the Nexus config ? I never had any course about it ...
Here is the config, which is the same on both nexus, except for the VSAN vlan id.
interface Ethernet1/7
description ESX7
switchport mode trunk
switchport trunk allowed vlan xx,xxx,xxx,xxxx,xxxxxx,
channel-group 7
interface port-channel7
description ESX7
switchport mode trunk
switchport trunk allowed vlan xx,xxx,xxx,xxxx,xxxxxx,
speed 10000
vpc 7
interface vfc7
bind interface port-channel7
switchport description VFC ESX7
no shutdown
SAN Connection:
interface fc1/32
switchport trunk allowed vsan Y
switchport description N6040-CTRLA
no shutdown
interface fc2/16
switchport trunk allowed vsan Y
switchport description N6040-CTRLB
no shutdown
sho inter eth1/7 counters errors
--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Eth1/7 0 0 0 0 0 0
--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Eth1/7 0 0 0 0 0 0
--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Eth1/7 0 -- 0 0 0 0
sho inter po7 counters errors
--------------------------------------------------------------------------------
Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards
--------------------------------------------------------------------------------
Po7 0 0 0 0 0 0
--------------------------------------------------------------------------------
Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts
--------------------------------------------------------------------------------
Po7 0 0 0 0 0 0
--------------------------------------------------------------------------------
Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err
--------------------------------------------------------------------------------
Po7 0 -- 0 0 0 0
sho inter fc1/30 counters
fc1/30
1 minute input rate 1776008 bits/sec, 222001 bytes/sec, 140 frames/sec
1 minute output rate 360048 bits/sec, 45006 bytes/sec, 81 frames/sec
4532202353 frames input, 5364654889448 bytes
0 class-2 frames, 0 bytes
4532202353 class-3 frames, 5364654889448 bytes
0 class-f frames, 0 bytes
0 discards, 0 errors, 0 CRC
0 unknown class, 0 too long, 0 too short
9317351721 frames output, 14089148468736 bytes
0 class-2 frames, 0 bytes
9317351721 class-3 frames, 14089148468736 bytes
0 class-f frames, 0 bytes
0 discards, 0 errors
0 input OLS, 0 LRR, 0 NOS, 0 loop inits
1 output OLS, 1 LRR, 0 NOS, 0 loop inits
0 link failures, 0 sync losses, 0 signal losses
0 transmit B2B credit transitions from zero
0 receive B2B credit transitions from zero
16 receive B2B credit remaining
3 transmit B2B credit remaining
0 low priority transmit B2B credit remaining
sho inter fc1/32 counters
fc1/32
1 minute input rate 222837768 bits/sec, 27854721 bytes/sec, 14937 frames/sec
1 minute output rate 86227648 bits/sec, 10778456 bytes/sec, 6377 frames/sec
119702843694 frames input, 206144432348384 bytes
0 class-2 frames, 0 bytes
119702843694 class-3 frames, 206144432348384 bytes
0 class-f frames, 0 bytes
0 discards, 0 errors, 0 CRC
0 unknown class, 0 too long, 0 too short
44140587957 frames output, 56851588018912 bytes
0 class-2 frames, 0 bytes
44140587957 class-3 frames, 56851588018912 bytes
0 class-f frames, 0 bytes
0 discards, 0 errors
2 input OLS, 2 LRR, 0 NOS, 0 loop inits
7 output OLS, 2 LRR, 4 NOS, 0 loop inits
3 link failures, 1 sync losses, 0 signal losses
0 transmit B2B credit transitions from zero
0 receive B2B credit transitions from zero
16 receive B2B credit remaining
1 transmit B2B credit remaining
0 low priority transmit B2B credit remaining
I have no flowcontrol enabled (could it solve the problem ? the SAN is connected with FC, not FCOE )
sho inter eth1/7 flowcontrol
--------------------------------------------------------------------------------
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------------------------------------------------------------------------------
Eth1/7 off off off off 0 0
Did someone already experienced those issues or could someone give me advice ?
Thanking you in advance ...
07-15-2013 05:32 AM
Hi,
were you able to find the cause? We got a very similar Problem here. THe only difference is, we're using NFS instead od FCoE
Thanks and regards,
Sven
08-15-2013 09:23 PM
Having same exact issue here. Using 5548UP, same configs on my Nexus as you - same port reports...going native FC 4Gb to a NetApp 2040 and FCoE to Emulex CNAs on HP servers. ESXi reports up to 30000 ms (yes, 30K) when doing heavy IO, at idle it is normal. No drops, nor errors at all. Windows 2012 does same thing...but even worse!
So we know it is not the OS - because it is two different OSs doing it, we know it is not the CNA, because we have Qlogic from your experience and we have Emulex and so that leaves only the NetApp or the Nexus. Well, we have iSCSI on the NetApp and do not experience this. So this leads me to believe there is some major issue with FCoE on the Nexus...particular to NetApp perhaps? We also have a NetApp 3250 with same Nexus 5548UPs, but we are doing native FCoE on the array too...so native FCoE --> to native FCoE - same issues. We should be getting 500MB/sec and we are getting 100MB second peak which drops to 50MB/sec during IO. It is like when we went from 1Gb iSCSI on Cisco 3750s and were getting 50-100MB/sec and went to now Nexus with 10Gb and FC/FCoE we are getting literally the SAME performance. Huh?
12-09-2013 04:10 AM
We are having similar issues as well.
A dot1q-tunneling network with WS-C3750X-24T-L. 10Gbit btw them.
Edge-switches connected with LACP and Etherchannel to the 3750X devices.
NetApp connected via 10Gbit to 3750X. Fallback 2x1Gbit LACP to edge-switch.
VMware/ESXi 5.1 connected to edge-switch with etherchannel, 2x1Gbit.
When we activate the 10Gbit interface for the NetApp, the latency goes through the roof. Wireshark shows a lot of strange things such as duplicate acks and tcp zero window.
When we connect a ESXi directly to a 10Gbit port on a new switch, and add the NetApp there as well, this part of the communication works very well, at the same time, the remote ESXi hosts have the strange latency issues as they are traveling over the dot1q network.
Did u guys find any solution or pinpoint the issue for this behavior?
//Rob
02-04-2014 01:03 AM
The reason for our issue was what the Cisco engineer called "micro bursts" that fills up the queue but isnt really showing as high bandwith usage.
So there were drops on the outgoing interface towards our 3750-stack with 4xGbit etherchannel.
We rebuilded the network and removed this device and made a true 10Gbit network. Now everything works just fine!
03-27-2014 11:55 AM
IBM, thats the problem. LOL
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide