cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4196
Views
0
Helpful
17
Replies

cisco 2960, 3560, 3750 switch series bidirectional throughput problem

Balazs Gyorgy
Level 1
Level 1

Hi Cisco Magi,

I have a serious issue here which goes on for a long time and I couldn`t find a solution for it so far.

The problem is affecting at least the 2960, 3560 and3750 series switches. No issues if all the interfaces on the switches are on 1G speed. However if I pick two laptops and one of them is running on 100Mbit and the other one is on gigabit I can`t get a 100Mbit throughput in both dierections at the same time. The traffic in one direction is alright (about 93 Mbit/s) but the other direction is about 7-10Mbit.

As a lab test I tried to isolate the problem with two 2960 switches (IOS - c2960-lanbasek9-mz.122-55.SE6, models: WS-C2960-48TT-L and WS-C2960-24TT-L ). Test topology: Laptop 1 --> 100Mbit switchport on switch 1 --> switch 1 --> gigabit connection --> switch 2 --> gigabit switchport on switch 2 --> Laptop 2

The test shows the same results if I use only one switch, with a 100Mbit and a Gigabit port. Please note that it affects only TCP traffic (tested with iperf and FTP). Nevertheless, If I manually configure the Gigabit port to speed 100 the problem disappears, I have perfect results in both directions.

Tried with and without mls qos, same results. Tried to adjust interface buffers, hold queues; queue-set buffers and thresholds, no success.

Maybe it is just a trivial config change that I`m not aware of, but it is really annoying. Any help/suggestion is appreciated.

Thanks,

Balazs

17 Replies 17

Joseph W. Doherty
Hall of Fame
Hall of Fame

Disclaimer


The   Author of this posting offers the information contained within this   posting without consideration and with the reader's understanding that   there's no implied or expressed suitability or fitness for any purpose.   Information provided is for informational purposes only and should not   be construed as rendering professional advice of any kind. Usage of  this  posting's information is solely at reader's own risk.

Liability Disclaimer

In   no event shall Author be liable for any damages whatsoever (including,   without limitation, damages for loss of use, data or profit) arising  out  of the use or inability to use the posting's information even if  Author  has been advised of the possibility of such damage.

Posting

What IOS are your hosts running?

Reason I ask, Windows XP, by default, sets TCP RWIN to 64 KB when connected to gig port, but only about 1/4 that when connected to 100 Mbps.

Balazs and Joseph,

Is there perhaps a possibility of the NICs using the MTU of 9K when working on 1Gbps, and 1.5K when working on lower speeds? In such case, it would result into MTU mismatches and possibly packet drops.

Best regards,

Peter

Disclaimer


The   Author of this posting offers the information contained within this   posting without consideration and with the reader's understanding that   there's no implied or expressed suitability or fitness for any purpose.   Information provided is for informational purposes only and should not   be construed as rendering professional advice of any kind. Usage of  this  posting's information is solely at reader's own risk.

Liability Disclaimer

In   no event shall Author be liable for any damages whatsoever (including,   without limitation, damages for loss of use, data or profit) arising  out  of the use or inability to use the posting's information even if  Author  has been advised of the possibility of such damage.

Posting

Peter, that's an interesting question!

Hi Peter,

I double checked the MTUs on both interfaces, and they show 1500Kbyte. No df bits are not set in the packets, it is not simply an MTU problem unfortunately, nor a speed or duplex issue.

I made several tests on different switches (2960 series) using different laptops, no success...

Hi,

 

Did you ever find a solution to this?

Thanks

 

Hi Joseph,

Both computers run Debian 6.0 as an OS. I forgot to mention in my oroginal post that years ago I made test with laptops running Win XP and Ubuntu 9.10 on 3750 switches, unfortunately with the same results.

Moreover both laptops have Gig ports, and if I make a dual iperf test between them via a single cat5 cable I got 900/900Mbit/s.

I would say it is not related to the end host TCP windowing directly.

Hi Joseph,

The problem occures on the direction form the Gig interface towards the 100M port, basically the "download" of the laptop on the 100Mbit port is the faulty.

No drops in the interface stats on the FA port:

FastEthernet0/24 is up, line protocol is up (connected)

  Hardware is Fast Ethernet, address is 001b.537f.3518 (bia 001b.537f.3518)

  MTU 1600 bytes, BW 100000 Kbit, DLY 100 usec,

     reliability 255/255, txload 1/255, rxload 1/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full-duplex, 100Mb/s, media type is 10/100BaseTX

  input flow-control is off, output flow-control is unsupported

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input never, output 00:00:01, output hang never

  Last clearing of "show interface" counters never

  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1816

  Queueing strategy: fifo

  Output queue: 0/40 (size/max)

  30 second input rate 0 bits/sec, 0 packets/sec

  30 second output rate 0 bits/sec, 0 packets/sec

     3639376 packets input, 4425186564 bytes, 0 no buffer

     Received 40 broadcasts (28 multicasts)

     0 runts, 0 giants, 0 throttles

     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

     0 watchdog, 28 multicast, 0 pause input

     0 input packets with dribble condition detected

     2989503 packets output, 2594906882 bytes, 0 underruns

     0 output errors, 0 collisions, 1 interface resets

     0 babbles, 0 late collision, 0 deferred

     0 lost carrier, 0 no carrier, 0 PAUSE output

     0 output buffer failures, 0 output buffers swapped out

The only drops I see are in the queue but I believe that`s normal due to the inital burst of the gig traffic:

sh mls qos interface fa0/24 statistics

FastEthernet0/24 (All statistics are in packets)

  dscp: incoming 

-------------------------------

  0 -  4 :     3639352            0            0            0            0 

  5 -  9 :           0            0            0            0            0 

10 - 14 :           0            0            0            0            0 

15 - 19 :           0            0            0            0            0 

20 - 24 :           0            0            0            0            0 

25 - 29 :           0            0            0            0            0 

30 - 34 :           0            0            0            0            0 

35 - 39 :           0            0            0            0            0 

40 - 44 :           0            0            0            0            0 

45 - 49 :           0            0            0            0            0 

50 - 54 :           0            0            0            0            0 

55 - 59 :           0            0            0            0            0 

60 - 64 :           0            0            0            0 

  dscp: outgoing

-------------------------------

  0 -  4 :     2940604            0            0            0            0 

  5 -  9 :           0            0            0            0            0 

10 - 14 :           0            0            0            0            0 

15 - 19 :           0            0            0            0            0 

20 - 24 :           0            0            0            0            0 

25 - 29 :           0            0            0            0            0 

30 - 34 :           0            0            0            0            0 

35 - 39 :           0            0            0            0            0 

40 - 44 :           0            0            0            0            0 

45 - 49 :           0            0            0            0            0 

50 - 54 :           0            0            0            0            0 

55 - 59 :           0            0            0            0            0 

60 - 64 :           0            0            0            0 

  cos: incoming 

-------------------------------

  0 -  4 :     3639399            0            0            0            0 

  5 -  7 :           0            0            0 

  cos: outgoing

-------------------------------

  0 -  4 :     2941529            0            0            0            0 

  5 -  7 :           0            0            0 

  output queues enqueued:

queue:    threshold1   threshold2   threshold3

-----------------------------------------------

queue 0:          28           0           0

queue 1:      522537        3050     1707057

queue 2:           0           0           0

queue 3:           0           0      760998

  output queues dropped:

queue:    threshold1   threshold2   threshold3

-----------------------------------------------

queue 0:           0           0           0

queue 1:         367           0        1249

queue 2:           0           0           0

queue 3:           0           0         200

Policer: Inprofile:            0 OutofProfile:            0

I would say it is a queuing/memory problem or configuration fine tuning that I`m just not aware of. Maybe it is a well known issue but I could not find any reference for it. It is hard to believe that only I have experienced this thing so far.

I`ll provide any information or output that can lead toward a solution.

Thanks for the help on this so far guys.

Disclaimer


The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

I've been thinking about this today, and what seems the most likely explanation, from gig to 100, the former is overrunning the latter, and encountering drops.  I would expect performance to be less than 100 to 100 Mbps, but what's surprising is the amount of reduced transfer performance you're seeing.  I would expect up to around 25% reduction, not 90% reduction.

Off the top of my head, I could see the sender waiting on timeouts greatly slowing the transfer rate but can't see a reason why you would be waiting on many timeouts.

Regarding TCP RWIN - that still might be a factor, although on the receiving gig to 100 Mbps host it might be too large.  I.e. making in smaller might improve gig to 100 Mbps.

Every get a packet capture of one of these transfer tests?

Hi Jospeh,

I uploaded a pcap capture file if you want to have a look on it, it is 340Mbyte though...

The link is:

http://www.spidernet.net.au/iperfcapture.pcap

Thanks,

Balazs

Joseph W. Doherty
Hall of Fame
Hall of Fame

Disclaimer


The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

In your test topology, which direction is showning the low transfer rate?  I.e. to the host with the 100 or to the host with gig?

Do any of the switch ports record drops, such as 100 Mbps egress?

Balazs Gyorgy
Level 1
Level 1

Hi Guys,

Any other ideas? I saw a couple of similar posts, but without a correct answer for them. I would be keen to know if at least anyone was able to reproduce this fault.

Balazs Gyorgy
Level 1
Level 1

One more thing. Maybe I haven`t emphasise enough yet, that the problem occurs only if the traffic flows in both directions at the same time. If I make an iperf test in a single direction the results are absolute perfect. The problem only appears when a dual test is issued.

Disclaimer

The  Author of this posting offers the information contained within this  posting without consideration and with the reader's understanding that  there's no implied or expressed suitability or fitness for any purpose.  Information provided is for informational purposes only and should not  be construed as rendering professional advice of any kind. Usage of this  posting's information is solely at reader's own risk.

Liability Disclaimer

In  no event shall Author be liable for any damages whatsoever (including,  without limitation, damages for loss of use, data or profit) arising out  of the use or inability to use the posting's information even if Author  has been advised of the possibility of such damage.

Posting

I haven't had a chance to look at your pcap file.  Didn't have any ideas until seeing your last post.  If you're transmitting at full capacity in both directions, gig to 100 Mbps should congest on egress, but from what you're now describing, you'll have contention between a main test flow and the other direction's TCP ACK flow.  If you QoS protect the ACKs, I'm wondering whether you would see a difference.

Hi Joseph,

I was thinking the same according the TCP ACKs, maybe the ACKs don`t reach the other end causing retransmission. However as I mentioned in the original post if I issue a speed 100 command on the gig interface causing it to negotiate Fa with that laptop the problem disappears. If both ports are Fa ports I get approximately a 100Mbit/100Mbit during a dual test as expected from a Cisco switch. The results are good between two GIG pots as well. The only scenario when the traffic assymetric when one of the ports is on 1Gig speed an the other one is on 100Mbit.

I hope I could clarify the problem a liitle bit more.

I found some similar topic here:

http://www.sadikhov.com/forum/index.php?/topic/165691-me3750s-wred-threshold-es-port/

Unfortunately there was no solution there either. I tried dozens of threshold and buffer allocation combinations, no success...

Review Cisco Networking products for a $25 gift card