The Nexus switch looks to

Rory Hamaker · ‎10-02-2015

Guys and Gals, I am going to put this post here as well. I have put this up on the VMware forums too, but not really sure where i need to start looking. I have 4 ESXI hosts that each have 4x 10gb connections on them. These connections go back to a Cisco Nexus 5508 switch where the ports are setup as trunks. The 4 connections on the server are setup in a distributed switch where i have my VLAN's configured and all of that seems to be working fine. My storage controller has a similar setup, 2x 10gb ports that are trunked back to the Nexus (same switch as ESX) and it is using hybrid SSD and 10k drives. Looking just at the numbers my network between the hosts and the storage should be smoking fast, but it isn't. I have been trying, without success, for 3 days now to create a new pool in Horizon View and it keeps timing out with the operation took longer blah blah blah error and i see in vcenter that the process in creating the replica is taking about an hour and a half. The original disk it is replicating is about 50gb and given the amount of bandwidth i have allocated to this, the process should take >10 mins. I am banging my head against a wall trying to track this down and don't know where to even start looking. Is it a vcenter setting a horizon setting or a misconfigured switch along the way? I might add we are using a 3560 L3 switch as a gateway and vlan router, so my first thought is maybe for some reason the traffic is leaving the 10g switch and visiting the 1g switch for a quickie then being sent back home where it came from to be directed out another port. I have checked and my nexus is setup for jumbo frames, my 1g switch is not, but the jumbo traffic shouldn't ever cross over there.

I will take any help you guys could give and will provide any logs you need as well, just let me know.

Walter Dey · ‎10-03-2015

are you using LACP from your DVS ?

what is the load balancing algorithm on the DVS (IP hash, or originating id ?)

Rory Hamaker · ‎10-05-2015

I am using originating virtual port for load balancing, and excuse my ignorance but i am unfamiliar with what LACP is or where i can check that setting

Steve Fuller · ‎10-03-2015

Hi Rory,

What’s the setup in terms of VMkernel interfaces etc., on your ESX hosts? Do you have a separate vmk interface for host management, NAS, vMotion etc., and if so can you provide details of the setup in terms of the IP addresses being used?

You mention you’re using distributed switch with the four 10GE interfaces assigned to that. I guess you’ve configured the VDS for jumbo frame e.g., 9000 byte MTU.

For which ever VMkernel interface on the ESX hosts is used as the connection to your NAS, does that interface have an IP address in the same VLAN as the IP address on your NAS? If so then that traffic should stay local to the N5K as you say, but if it’s in a different VLAN it’s very likely going to the Catalyst 3560.

Do you have or can get access to the shell on your ESXi servers? If so can you run vmkping –I <vmk_interface> -d -s 8972 <nas_ip_address> from your ESX servers to verify you have jumbo frame capability end-to-end between your ESX hosts and NAS? The 8972 byte value I used assumes you’re using 9000 byte MTU, but if it’s some other value you’ll need to adjust the value after the -s in the vmkping command to MTU minus 28 bytes.

Regards

Rory Hamaker · ‎10-05-2015

OK so the hosts are setup with two VMK interfaces. One is on setup for management on subnet using a 192.168.1.x vlan and the other is vmotion setup on a 192.168.30.x vlan (which is also what storage is on as well). The VDS MTU is set to 9000 as well as the VMK's for vmotion but not for the management.

I am going to have to check which interface is being used for storage connection, i don't recall a setting in vcenter that lets me specify what interface storage traffic is on. That being said, vmotion and storage are on the same VLAN so in that regard the traffic shouldn't leave the n5k for any reason.

I shelled into one of the ESX hosts this morning and was unable to perform a ping using the commang you provided. Errored out saying packets were to big.

Steve Fuller · ‎10-05-2015

Hi,

Is the message the same as the following:

~ # vmkping -I vmk0 -d -s 8972 192.168.1.51
PING 192.168.1.51 (192.168.1.51): 8972 data bytes
sendto() failed (Message too long)
sendto() failed (Message too long)

--- 192.168.1.51 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

If so then I think the ESX host still considers the MTU to be 1500 bytes rather than 9000 bytes. You can prove that by doing the same test and using -s 1472, which should work, and then -s 1473, which should fail with the same message as above.

When using jumbo frames you need to enable this in two places: the portgroup and on the VMKernel port. The process is shown in the KB article Enabling Jumbo Frames on virtual distributed switches.

What’s odd if the ESX server is using a 1500 byte MTU, is that when it establishes a TCP connection to the NAS it should use a TCP Maximum Segment Size (MSS) appropriate for that MTU. Ignoring TCP options, this would be somewhere in the order of 1460 byes (1500 bytes minus 40 bytes for IP and TCP).

Regards

Rory Hamaker · ‎10-07-2015

OK so results are exactly what you described. VMKping with a size over 1472 fails (i don't get the message to long error every time is usually just fails). I went through the KB again and verified my settings, but my dv switch has the mtu set to 9000 and the virtual adapter for each host has the MTU at 9000 on the vmotion interface but 1500 on the management interface.

Walter Dey · ‎10-07-2015

Hi Rory

- What kind of IP storage protocol are you using ? iSCSI and/or NFS ?

- the fact that 1472 Byte ping fails: you don't have a jumbo frame path end to end !

- what is your expectation: network performance Jumbo versus standard Ethernet frame size ? It is very frame size distribution dependent, 20+- % is what I have seen !

- I don't think Jumbo frames is the answer for your performance problem

Walter.

Steve Fuller · ‎10-07-2015

Hi Walter,

Thanks for joining back in.

I'd agree with your second and third points (though being pedantic 1472 byte ping works OK, and 1473 byte fails), but to your last point, I think this may be a frame size issue.

We know there's connectivity between the ESX hosts and the NAS and we can see that with the ping. Where I think the problem lies is that both the ESX host and the NAS are working on the basis they can support jumbo frames, but as you say there’s no end-to-end jumbo frame capability.

When the TCP connection for NFS is established between the ESX host and the NAS, they will both advertise an MSS of ~8960 bytes and establish the TCP connection on that basis. This is not a problem all the time there’s only small packets between them e.g., NFS heartbeats. The problem comes when they try and transfer large volumes of data using packets up to the TCP MSS (minus TCP/IP header), and there's something in the Layer-2 path dropping that traffic. As the path between ESX host and NAS is Layer-2 there’s no indication back to either of the problem, simply a silent discarding of traffic.

Rory

Is it possible to get the output of a couple of command from the CLI of your ESX host? If so can you post the output from the esxcli storage nfs list, esxcli network ip interface ipv4 get and esxcli network ip route ipv4 list commands?

And from the Nexus 5K can you capture and post the output of the show policy-map type network-qos command?

Regards

Rory Hamaker · ‎10-07-2015

@Walter:

we are using NFS for storage. The unit we have is for vmware storage only and is only connected to our ESX hosts.

I don't have a frame size performance expectation per se, i am just trying to get the traffic to flow faster (at least what i feel like should be faster than it is).

@steve

sure:

~ # esxcli storage nfs list

Volume Name Host Share Accessible Mounted Read-Only Hardware Acceleration

----------- ------------- -------------- ---------- ------- --------- ---------------------

Tintri 192.168.30.9 /tintri true true false Not Supported

Tintri-2 192.168.30.10 /tintri true true false Not Supported

~ # esxcli network ip interface ipv4 get

Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type DHCP DNS

---- ------------- ------------- -------------- ------------ --------

vmk0 192.168.1.80 255.255.255.0 192.168.1.255 STATIC false

vmk1 192.168.30.80 255.255.255.0 192.168.30.255 STATIC false

~ # esxcli network ip route ipv4 list

Network Netmask Gateway Interface Source

------------ ------------- ------------- --------- ------

default 0.0.0.0 192.168.1.254 vmk0 DHCP

192.168.1.0 255.255.255.0 0.0.0.0 vmk0 MANUAL

192.168.30.0 255.255.255.0 0.0.0.0 vmk1 MANUAL

From Nexus:

2097_nexus# show policy-map type network-qos

Type network-qos policy-maps

===============================

policy-map type network-qos jumbo

class type network-qos class-default

mtu 9216

multicast-optimize

policy-map type network-qos default-nq-policy

class type network-qos class-default

mtu 1500

multicast-optimize

policy-map type network-qos fcoe-default-nq-policy

class type network-qos class-fcoe

pause no-drop

mtu 2158

class type network-qos class-default

mtu 1500

multicast-optimize

Steve Fuller · ‎10-07-2015

Thanks Rory,

We can see the policy-map with the jumbo frame support configured, but this doesn’t show whether it’s applied. If you run the command show queueing interface Ethernet <x>/<y> do you see "HW MTU: 9216" under the qos group?

The other way to check is to run the command show runn | sec “system qos” (including the quotes) and verify you’ve got the command service-policy type network-qos jumbo configured.

Regards

Rory Hamaker · ‎10-08-2015

based from the first command you requested i run, here is the output of the queuing interface

2097_nexus# show queuing interface e1/29

Ethernet1/29 queuing information:

TX Queuing

qos-group sched-type oper-bandwidth

0 WRR 100

RX Queuing

qos-group 0

q-size: 470080, HW MTU: 9216 (9216 configured)

drop-type: drop, xon: 0, xoff: 470080

Statistics:

Pkts received over the port : 2800381592

Ucast pkts sent to the cross-bar : 2546890739

Mcast pkts sent to the cross-bar : 253490853

Ucast pkts received from the cross-bar : 1744194204

Pkts sent to the port : 2966774719

Pkts discarded on ingress : 0

Per-priority-pause status : Rx (Inactive), Tx (Inactive)

Total Multicast crossbar statistics:

Mcast pkts received from the cross-bar : 1222580515

Steve Fuller · ‎10-10-2015

The Nexus switch looks to have the correct MTU set. And we can see the ESX host has the vmk1 interface in the 192.168.30.0/24 subnet and NFS mounts to NAS at the IP 192.168.30.9 and 30.10. So the NAS traffic should always use the vmk1 interface as that's the best route to the NAS from the host.

One thing you said earlier that looks odd.

“VMKping with a size over 1472 fails (i don't get the message to long error every time is usually just fails).”

Can you be clear what you mean when you say you don't see "message too long" every time?

For the vmk0 interface which has an MTU of 1500 bytes, you should see the error message every time as soon as you specify 1473 byte pings. For the vmk1 interface which has an MTU of 9000 bytes, you should see the error message as soon as you specify 8972 byte pings.

If you see this message at all when specifying the vmk1 interface and an MTU of less than 8972 bytes, then there's somewhere on the host that does not have MTU 9000 set.

To verify the MTU settings of the VDS and VMkernel interface are correct can you run the esxcli network nic list and the esxcli network ip interface list commands respectively.

Regards

Slow Transfer Speeds over Nexus 5508