cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2428
Views
10
Helpful
5
Replies

Dell Compellent SC8000 SAN Connectivity to Cisco Nexus 3172

Michael Medwid
Level 1
Level 1

My storage engineers are seeing latency they are attributing to the network 20-30ms to from Dell/VMWare ESX servers - as seen in the Dell management utility. Dell blames the network and says the problem is that Flow Control is not enabled. 

When I went to try and enable flow control I was disappointed to find:

sw-3172-a(config-if)# flowcontrol receive on
ERROR: This CLI is not supported on n3k platform

and later saw: 

"link-level flowcontrol (LLFC) is not supported on the Nexus 3000 and 3100 series. It is supported on the Nexus 3500 series and Nexus 9000"

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus3000/sw/release/70322c/n3k_70322c_nxos_rn.html


Is that the end of the story? I see there is another type of flow control "priority flow control". Would that likely serve the the same purpose for Dell Compellent needs? Looks like a lot of configuration. What I am seeing is the Compellent SAN sending RxPause frames to the 3172. Any insight on this pairing is appreciated!

 

# sho int flowcontrol

--------------------------------------------------------------------------------
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------------------------------------------------------------------------------
Eth1/1 off off off off 622377865 0
Eth1/2 off off off off 632037851 0
Eth1/3 off off off off 374231740 0
Eth1/4 off off off off 377617928 0
Eth1/5 off off off off 888908 0
Eth1/6 off off off off 16576 0

 

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus3000/sw/qos/602_U1_1/b_3k_QoS_Config_602_U11/b_3k_QoS_Config_602_U11_chapter_0101.html

5 Replies 5

matthew.oh
Level 1
Level 1

Hi Michael, 

 

We have exact same problem with almost same hardware.

We have sc4020 with nexus 3172PQ and when I connect SAN to nexus disk latency goes over 300ms and throughput of 10M. 

when we rdp to vm it always show the warning windows to confirm the server was turned off. 

There is a qos and "priority flow control" but it doesn't work at all.

Our SAN was configured with "COS priority 0" so adjust all values accordingly but the problem doesn't go away. 

However, if I connect SAN to extreme switches then we get less than 1ms latency and 10gbs of speed.

Dell engineers pointed that there is no flow-control enabled and I told them qos is doing it.

I have TAC case open and working on it for about 3 weeks but there is no changes.

 

I will update on this thread when we find any resolution.

 

Here are some outputs,

 

Ethernet1/10 is up
admin state is up, Dedicated Interface
Hardware: 100/1000/10000 Ethernet, address: 7070.8be0.8371 (bia 7070.8be0.8371)
Description: HYPV-S03_S7_P2
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off

NEXUS-03# sh int prio

============================================================
Port Mode Oper(VL bmap) RxPPP TxPPP
============================================================

Ethernet1/1 On Off 0 0
Ethernet1/2 On Off 0 0
Ethernet1/3 Auto Off 0 0
Ethernet1/4 Auto Off 0 0
Ethernet1/5 Auto Off 0 0
Ethernet1/6 Auto Off 0 0
Ethernet1/7 Auto Off 0 0
Ethernet1/8 On On (0) 0 0
Ethernet1/9 Auto Off 0 0
Ethernet1/10 On On (0) 0 0
Ethernet1/11 Auto Off 0 0
Ethernet1/12 On On (0) 0 0

NEXUS-03# sh que interf e1/1

slot 1
=======


HW MTU of Ethernet1/1 : 9216 bytes

Egress Queuing for Ethernet1/1 [System]
------------------------------------------------------------------------------
QoS-Group# Bandwidth% PrioLevel Shape QLimit
Min Max Units
------------------------------------------------------------------------------
1 70 - - - - 7(D)
0 30 - - - - 7(D)
+-------------------------------------------------------------------+
| QOS GROUP 0 |
+-------------------------------------------------------------------+
| | Unicast | OOBFC Unicast | Multicast |
+-------------------------------------------------------------------+
| Tx Pkts | 270368| 0| 815|
| Tx Byts | 490691215| 0| 90202|
| Dropped Pkts | 0| 0| 0|
| Dropped Byts | 0| 0| 0|
| Q Depth Byts | 0| 0| 0|
+-------------------------------------------------------------------+
| CONTROL QOS GROUP |
+-------------------------------------------------------------------+
| | Unicast | OOBFC Unicast | Multicast |
+-------------------------------------------------------------------+
| Tx Pkts | 850| 0| 1486|
| Tx Byts | 72460| 0| 99938|
| Dropped Pkts | 0| 0| 0|
| Dropped Byts | 0| 0| 0|
| Q Depth Byts | 0| 0| 0|
+-------------------------------------------------------------------+

Port Egress Statistics
--------------------------------------------------------
WRED Drop Pkts 0
WRED Non ECN Drop Pkts 0

 

NEXUS-03# sh hardware internal interface ethernet 1/1 asic counters
Important Counters/Drops
--------------- --------- --------- --------- --------- --------- ---------
Interface Name Forward Forward Error Pkt Error Pkt QOS Rx QOS Tx
RxDrops TxDrops RxDrops TxDrops Drops Drops
--------------- --------- --------- --------- --------- --------- ---------
Ethernet1/1 16384015 0 676704 681324 0 0
--------------- --------- --------- --------- --------- --------- ---------
Summary view may double count some stats, look at Detailed Counters

One thing I would note is that the Compellent SAN sends a LOT of RX Pause frames to the 3172 Nexus. That I believe is the compellent telling the Nexus to slow down that it can't handle what's coming at it. 1) It makes me wonder why the Compellent can't handle the traffic. I've never run into this with any other SAN - e.g. Netapp nor Tegile. Are they just using archaic Ethernet NICs? 2) I wonder if there could be a way to turn OFF flow control on the Compellent. The idea there would be that something that's supposed to enhance performance is actually just causing a mess. I don't have access to our SAN. Do you have a thought on that?

 

# sho int flowcontrol

--------------------------------------------------------------------------------
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------------------------------------------------------------------------------
Eth1/1 off off off off 623682057 0
Eth1/2 off off off off 633354213 0
Eth1/3 off off off off 374706752 0
Eth1/4 off off off off 378087903 0
Eth1/5 off off off off 888910 0
Eth1/6 off off off off 16583 0
Eth1/7 off off off off 45411 0
Eth1/8 off off off off 217034 0
Eth1/9 off off off off 3898905 0

Turning off the pause frames on the Compellent (which I don't think you can) would be similar to to cutting the wire to the check engine light in your car...

It's not the cause, but simply an indicator of underlying storage processor/buffering issues, as well as a flow control mechanism that is trying to get the sender to slow down.

What NIC speeds are involved?

These issues are usually caused by disk/storage level latency (inability of san to keep up with IOPs/throughput demands)

Do you have other types of traffic traversing the 3172 besides iscsi/NFS storage traffic?

If you don't have jumbo frame enabled, I would make sure you get that configured as I've seen that give storage throughput a 25/30 % bump.

Thanks,

Kirk...

 

Hi Michael,

 

I read this from Dell support forum and they said "This is a problem on any NX OS after nxos.7.0.3.I2.2a.bin, I've tried so many versions and it seems it will never work again.  Theory is they were allowing something they shouldn't have been and a true fix stopped something that was allowing it to work.  But again, bypassing the FEX works, not a viable solution in my network.  The FEX are 2232" 

 

https://www.dell.com/community/Compellent/Dell-Compellent-SC4020-with-Nexus-9372-Networking/td-p/5170237

 

We are currently on NX-OS 7.0.3.I7.3 and thinking of changing it once it is confirmed.

 

Regards,

 

 

 

dhimaar
Level 1
Level 1

Hey Michael,

Were you able to resolve the slowness issues with 3172? 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: