cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
14847
Views
10
Helpful
15
Replies

Cisco UCS - Slow Performance

Heinz Kern
Level 1
Level 1

We are facing a strange behaviour of our virtual servers on Cisco UCS Blade with Fabric Interconnect 6100.

  • when we place the virtual server on a HP-standalone server (with ESX) we are facing proper performance (even if connectio n speed is only 1Gig)
  • When we move the virtual server on Cisco UCS (where we have 10Gig connection) we have 1/10th of performance

the performance is overall good as long as we stay within the LAN (low delay). as soon as we copy via WAN (10Gig connection with a RTT of approx. 7msec) we are facing this big differences between virtual server on old HP and new UCS,

we had a look into the traces and we see that the virtual server on Cisco UCS is not able to use the whole tcp window. during the tcp handshake a window scaling factor is negotiated but during the copy it seems that it is not able to use it (the server is able to use max. 16060 bytes that are unacknowledged). as soon as this windows is full he has to wait for 7msec for the acknowledge.

any ideas why this can happn on UCS and what we could change??

thx

15 Replies 15

brianaxford
Level 1
Level 1

We are having the same issues with our UCS. 1/10th the performance. Did you find a solution?

no not yet - TAC Case is open but the engineer didn´t produce any senseful ouput until now

we assume that it is a problem with the TCP-Windowing (the problem only occurs when we transfer data via a long distance with several msecs delay). but the root cause hasn´t been found yet.

I had very similar issue for months and in the end it took only two clicks of the mouse to fix all my problems:

Changed load balancing policy from Route based in IP hash to Route based on the originating virtual port ID and that was it.

 

 

 

 

interesting. i will think of it when we face problems again

And most important: it had nothing to do with UCS; as matter of fact the same behaviour would show up with any other solution.

Correct, not exactly UCS issue but apart from ESXi configuration it can be related on how's upstream switch configured.

My nexus 5k's were set to LACP port channel and vmware standard switch does not support LACP

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004048

 

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2006129

 

 

The big misunderstanding: Blade (vswitch) to FI communication:

Route based on IP hash does not work with UCS blades: Fabric Interconnect doesn't support vPC.

The same MAC address constantly changes between the 2 fabrics, bringing performance down.

2nd this - You would only ever use ip hash if your upstream network supported "static port channel" of some description - With UCS as both FI's are independent bridges (bridges, from a port channel perspective) a connection to each FI could never support ip hash....... 

Hi vladmcisco,

 excellent ... thanks for sharing ... very important. (+5 for that).

 

Regards.

Did you ever find the root cause.?? I've a TAC case open with the same issue too but they are lost...

Cheers

Andrew - What is the SR #. I'll take a look at it.

Robert

Hi Robert, The case # is 626340579. just let me know if you need any more detail

Cheers

Andrew,

A few comments/questions about your issue.  It's interesting we see the issue in one direction only.  I'd bet this is a driver or UCS configuration issue.

1. Regarding virtual systems involved in the tests on both sides, are these on UCS blades on both sites or just the one site?

2. I need a topology for the tests you ran. Include and & all devices in the path from source to destination.  The difference in behavior is likely going to involve different devices and/or paths.  I see mention of an N7K, but I don't see the topology. 

3. Please gather a UCSM show tech and upload it to your case.  I'd like to see your QoS settings.  Any chance you're applying QoS at the UCS level which could affect the outbound traffic (but not inbound)? This is a common issue people overlook.

4. Were the ESX VIC enic drivers on your UCS ESX hosts ever updated as recommended by the engineer?  I don't see any confirmation in the SR notes either way. 

5. Have you looked for drops throughout the path from UCS source vNIC to egress UCS for any drops or Pauses?

This could also be a TCP windowing issue.  How is the 1GB file transfer being done, just simple SMB transfer?  Might want to run up iperf and test UDP to confirm if this is a windowing issue with TCP. 

Let's start there.  With the above answered/provided I'm confident we can progress your case forward.

Regards,

Robert

Hi Robert,

Hope the below helps..

1. Regarding virtual systems involved in the tests on both sides, are these on UCS blades on both sites or just the one site? - The UCS is only on one side and the issue is only when have traffic egress to the UCS to either the LAN (physical servers also connected to the N7K) or across the WAN to either physical or virtual servers (IBM hardware platforms not Cisco)

2. I need a topology for the tests you ran. Include and & all devices in the path from source to destination.  The difference in behavior is likely going to involve different devices and/or paths.  I see mention of an N7K, but I don't see the topology. The UCS has 2 FI's Fabric A and Fabric B each connected via 6x1Gb links running as LACP port channels to the N7K. Physical servers and also attached to the same N7K. Its a fairly simply topology.

3. Please gather a UCSM show tech and upload it to your case.  I'd like to see your QoS settings.  Any chance you're applying QoS at the UCS level which could affect the outbound traffic (but not inbound)? This is a common issue people overlook.Will upload this now.

4. Were the ESX VIC enic drivers on your UCS ESX hosts ever updated as recommended by the engineer?  I don't see any confirmation in the SR notes either way. Yes did this and it had no effect

5. Have you looked for drops throughout the path from UCS source vNIC to egress UCS for any drops or Pauses? Dont see any drops etc. Only more variable latency from the the servers inside the UCS..?

Hope this helps for starters.. Thanks

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card