Nexus 5548 - Server Group Complaining of Slow Windows 2012 Copie

TOM FRANCHINA · ‎04-09-2013

We are new to the Nexus line. 2 servers were installed on the Cisco Nexus 5548 with 10 gig NICs. The server group is saying that some database copies are taking just as long as they were on 1 gig cards but some of the SQL database copies were up to 4 times faster:

Here is the basic config that we are using. All interfaces are configured as int e1/1 and only SNMP and Username/Passwords were deleted.

Any suggestions on the best practice config for Windows 12 Server would be appreciated.

Thanks,

Tom

version 5.1(3)N2(1)

hostname Nex-Comp-center-10.23.10.2

feature telnet

no feature http-server

feature interface-vlan

feature lldp

feature vtp

ip domain-lookup

class-map type qos class-fcoe

class-map type queuing class-fcoe

match qos-group 1

class-map type queuing class-all-flood

match qos-group 2

class-map type queuing class-ip-multicast

match qos-group 2

class-map type network-qos class-fcoe

match qos-group 1

class-map type network-qos class-all-flood

match qos-group 2

class-map type network-qos class-ip-multicast

match qos-group 2

ntp server 192.168.23.21 prefer

vrf context management

interface Vlan1

interface Vlan23

no shutdown

management

ip address 10.23.10.2/16

interface Ethernet1/1

description server port

switchport access vlan 23

spanning-tree port type edge

*** All 32 interfaces same as above ***

interface mgmt0

shutdown force

clock timezone EST -5 0

clock summer-time EST 2 Sunday March 02:00 1 Sunday November 02:00 60

line console

line vty

session-limit 30

boot kickstart bootflash:/n5000-uk9-kickstart.5.1.3.N2.1.bin

boot system bootflash:/n5000-uk9.5.1.3.N2.1.bin

ip route 0.0.0.0/0 10.23.0.1

Steve Fuller · ‎04-10-2013

Hi Tom,

Can you provide some more details of the topology and how the servers are connected? Assuming dual 10GE NICs per server:

- are they both physically connected to the same switch or do you have a pair of Nexus switches with one NIC connected to each?

- if you have a pair of Nexus switches, what is the connectivity between them?

- are the servers configured for NIC teaming and if so which teaming algorithm are they using?

- have you tried using a single NIC on the servers and see if the performance is then consistent?

In terms of general questions on the switch(s), what is the connection type i.e., fibre or copper and do you see any errors on the switch interfaces to which the servers connect?

It's odd that you're seeing performance improvements for some copies and not others. Are you able to do some network performance testing with a tool such as iperf? As per the website:

Iperf was developed by NLANR/DAST as a modern alternative for measuring maximum TCP and UDP bandwidth performance.

If you can get some results using this tool we could at least see what the raw TCP throughput is like which might help us understand whether the issue is in the network, or the application used for copying.

Regards

TOM FRANCHINA · ‎04-10-2013

Steve,

Thanks for the quick response.

- Servers are connect to the same Nexus Switch... we have only one

- Servers only have 1 NIC... both servers have identical NICs

- No teaming used

- We are using the following cable SFP-H10gb Twin axial Cable

We did an Iperf... but we were only able to get 500meg the same we get on a 1gig port. I was thinking that Iperf will not do 10g

Thanks again... look forward to your thoughts now

Jorge Pizarro · ‎11-06-2014

Hi everybody.

Looking for I found this discussion. I have the same problem, bad throughput with this configuration:

2 servers UCS c200 m2, 1 Nexus 5548UP and Netapp Storage.

Each Server has 1 Cisco VIC FCoE 10Gb attached with a Twinax Cable to the Nexus. Transfer rate between servers or VM doesn't exceed 150MBps in the best case.

I have Windows Server 2012 R2 DC and hyper-v. Any ideas for this?

Thanks!!

Steve Fuller · ‎04-10-2013

Hi Tom,

You should be able to get more than 500mbps with iperf. If you open two cmd prompts on the same server, then in one window run the iperf-s, and in the second widow run iperf -c 127.0.0.1, this will tell you what the host CPU is capable of.

Perhaps run that test on both hosts to see if one is different to the other. Also, if you haven't already, run the iperf -c from server1 to server2, and from server2 to server1 to se if the performance is different in each direction.

As before, can you paste the show interface from the switch for both interfaces the server is conected to. The config here is pretty simple and so there should be nothing in the switch that would affect performance to this extent.

Regards

Sent from Cisco Technical Support Android App

TOM FRANCHINA · ‎04-10-2013

Here are the sho int for the servers that are connect:

Ethernet1/1 is up

Hardware: 1000/10000 Ethernet, address: 547f.eed3.73a8 (bia 547f.eed3.73a8)

Description: server port

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA

Port mode is access

full-duplex, 10 Gb/s, media type is 10G

Beacon is turned off

Input flow-control is off, output flow-control is off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

Last link flapped 03:40:35

Last clearing of "show interface" counters never

30 seconds input rate 12854608 bits/sec, 2778 packets/sec

30 seconds output rate 8479280 bits/sec, 3004 packets/sec

Load-Interval #2: 5 minute (300 seconds)

input rate 11.70 Mbps, 2.62 Kpps; output rate 8.42 Mbps, 2.92 Kpps

RX

6480692782 unicast packets 405409 multicast packets 312472 broadcast packets

6481410663 input packets 7922519552448 bytes

0 jumbo packets 0 storm suppression bytes

0 runts 0 giants 0 CRC 0 no buffer

0 input error 0 short frame 0 overrun 0 underrun 0 ignored

0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop

0 input with dribble 0 input discard

0 Rx pause

TX

3841484041 unicast packets 5443679 multicast packets 31992710 broadcast packets

3878920430 output packets 4867640205354 bytes

0 jumbo packets

0 output errors 0 collision 0 deferred 0 late collision

0 lost carrier 0 no carrier 0 babble 0 output discard

0 Tx pause

8 interface resets

Ethernet1/2 is up

Hardware: 1000/10000 Ethernet, address: 547f.eed3.73a9 (bia 547f.eed3.73a9)

Description: server port

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA

Port mode is access

full-duplex, 10 Gb/s, media type is 10G

Beacon is turned off

Input flow-control is off, output flow-control is off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

Last link flapped 4d15h

Last clearing of "show interface" counters never

30 seconds input rate 6884072 bits/sec, 737 packets/sec

30 seconds output rate 2220096 bits/sec, 951 packets/sec

Load-Interval #2: 5 minute (300 seconds)

input rate 6.98 Mbps, 701 pps; output rate 2.66 Mbps, 933 pps

RX

376549352 unicast packets 13616 multicast packets 18926 broadcast packets

376581894 input packets 473976136125 bytes

0 jumbo packets 0 storm suppression bytes

0 runts 0 giants 0 CRC 0 no buffer

0 input error 0 short frame 0 overrun 0 underrun 0 ignored

0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop

0 input with dribble 0 input discard

0 Rx pause

TX

220371600 unicast packets 1235973 multicast packets 7630000 broadcast packets

229237573 output packets 86591391784 bytes

0 jumbo packets

0 output errors 0 collision 0 deferred 0 late collision

0 lost carrier 0 no carrier 0 babble 0 output discard

0 Tx pause

2 interface resets

Ethernet1/3 is up

Hardware: 1000/10000 Ethernet, address: 547f.eed3.73aa (bia 547f.eed3.73aa)

Description: server port

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA

Port mode is access

full-duplex, 10 Gb/s, media type is 10G

Beacon is turned off

Input flow-control is off, output flow-control is off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

Last link flapped 4d22h

Last clearing of "show interface" counters never

30 seconds input rate 7016 bits/sec, 1 packets/sec

30 seconds output rate 916768 bits/sec, 464 packets/sec

Load-Interval #2: 5 minute (300 seconds)

input rate 4.82 Kbps, 2 pps; output rate 894.85 Kbps, 425 pps

RX

58527321 unicast packets 3239364 multicast packets 4227 broadcast packets

61770912 input packets 28127937872 bytes

0 jumbo packets 0 storm suppression bytes

0 runts 0 giants 0 CRC 0 no buffer

0 input error 0 short frame 0 overrun 0 underrun 0 ignored

0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop

0 input with dribble 0 input discard

0 Rx pause

TX

4147172947 unicast packets 1318113 multicast packets 8115147 broadcast packets

4156606207 output packets 6167969623679 bytes

0 jumbo packets

0 output errors 0 collision 0 deferred 0 late collision

0 lost carrier 0 no carrier 0 babble 0 output discard

0 Tx pause

2 interface resets

Ethernet1/5 is up

Hardware: 1000/10000 Ethernet, address: 547f.eed3.73ac (bia 547f.eed3.73ac)

Description: server port

MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA

Port mode is access

full-duplex, 10 Gb/s, media type is 10G

Beacon is turned off

Input flow-control is off, output flow-control is off

Rate mode is dedicated

Switchport monitor is off

EtherType is 0x8100

Last link flapped 4d16h

Last clearing of "show interface" counters never

30 seconds input rate 1400 bits/sec, 1 packets/sec

30 seconds output rate 911136 bits/sec, 461 packets/sec

Load-Interval #2: 5 minute (300 seconds)

input rate 1.00 Kbps, 1 pps; output rate 891.08 Kbps, 424 pps

RX

1017185533 unicast packets 14712 multicast packets 261638 broadcast packets

1017461883 input packets 1540562658617 bytes

0 jumbo packets 0 storm suppression bytes

0 runts 0 giants 0 CRC 0 no buffer

0 input error 0 short frame 0 overrun 0 underrun 0 ignored

0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop

0 input with dribble 0 input discard

0 Rx pause

TX

485206955 unicast packets 1257118 multicast packets 7519733 broadcast packets

493983806 output packets 49657770108 bytes

0 jumbo packets

0 output errors 0 collision 0 deferred 0 late collision

0 lost carrier 0 no carrier 0 babble 0 output discard

0 Tx pause

1 interface resets

Steve Fuller · ‎04-10-2013

The other obvious option here is to connect the servers back-to-back (if possible) and check the performance. That will definitely rule out anything on the switch. The copper SFP+ cable you use to connect the server to the switch should work directly between the servers.

Regards

Sent from Cisco Technical Support Android App

TOM FRANCHINA · ‎04-10-2013

The servers are in production... but if need be we could try that.

Always good to get back to basics....

Thanks

Steve Fuller · ‎04-10-2013

There are no errors or anything obvious on the show interface output. Would you be able to run the iperf tests I suggested? This would be non disruptive so could be done without any maintenance window.

Regards

Sent from Cisco Technical Support Android App

TOM FRANCHINA · ‎04-11-2013

I intend to do that as soon as I can get a Server Admin to let me in.

When setting up a NIC card what do you recommend "Auto" mode or 10G forced mode for the speed setting. The switch settings on the interfaces are set to auto. I know there are no errors but one of the admins said that they had problems with forces settings at 1gig.

Thanks,

Tom

TOM FRANCHINA · ‎04-11-2013

We ran iperf in local mode and got 2gig on one of the servers. We will be testing more at noon EST.

Also FYI the 10gig cards to not have a setting for Speed and Duplex.

A vendor recommended turning off "large send offload"

Darren Ramsey · ‎04-11-2013

I typically see 9G+ on W2K8 64bit with Emulex (10102) and Qlogic (8242) nics as tested by IXIA Chariot. Have not tested Win 2012 but some things that will slow you down are Chimney, RSS, TOE, MSClustering and bad vendor drivers. Use NETSH to view and change these parameters...

C:\>netsh int tcp show global

Querying active state...

TCP Global Parameters

----------------------------------------------

Receive-Side Scaling State : enabled

Chimney Offload State : automatic

NetDMA State : enabled

Direct Cache Acess (DCA) : disabled

Receive Window Auto-Tuning Level : normal

Add-On Congestion Control Provider : none

ECN Capability : disabled

RFC 1323 Timestamps : disabled

I have seen TOE implementations in the vendor's drivers completely kill throughput, try turning that off and you may need 2-4 iPerf threads to completely utilize a 10G link.

Please rate helpful posts.

TOM FRANCHINA · ‎04-11-2013

Darren,

Thanks... we turned off TOE and it increased the preformance.

Our iperf testing from server to server is still low. We ran iperf -c 172.0.0.1 -P 2 and go 1.65meg

Darren Ramsey · ‎04-11-2013

Who is your 10G nic vendor?

Please rate helpful posts.

Steve Fuller · ‎04-12-2013

There has to be something odd with the servers or the OS. I can get 1.4 Gbit/s on a very old and croaky Dell PowerEdge 1750 (single Xeon CPU).

C:\Documents and Settings\Administrator\My Documents\bin\iperf>iperf -c 127.0.0.1

------------------------------------------------------------

Client connecting to 127.0.0.1, TCP port 5001

TCP window size: 64.0 KByte (default)

------------------------------------------------------------

[ 3] local 127.0.0.1 port 2784 connected with 127.0.0.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 1.64 GBytes 1.41 Gbits/sec

On a more recent and respectably specified server (albeit running Linux) I get 18 Gbit/s with a local iperf test.

[sfuller@rhel6-eth0 ~]$ iperf -c 127.0.0.1

------------------------------------------------------------

Client connecting to 127.0.0.1, TCP port 5001

TCP window size: 131 KByte (default)

------------------------------------------------------------

[ 3] local 127.0.0.1 port 33997 connected with 127.0.0.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 21.3 GBytes 18.3 Gbits/sec

I assume these are recent servers and as such should be able to perform much better.

I came across the Microsoft Performance Tuning Guidelines for Windows Server 2012 and this has a section on network performance testing, using NTttcp instead of iperf, but the same principle should apply.

Can you point you server team to this document and have them go through some of the testing?

Regards

Nexus 5548 - Server Group Complaining of Slow Windows 2012 Copies