cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3979
Views
5
Helpful
12
Replies

Baffling WAN Throughput Issue

Peter Lyttle
Level 1
Level 1

Hello,

I was hoping some of the gurus on here may be able to help me shed some light on a very weird problem(s) I have been seeing on a 100mbit point to point connection.

Here is a network diagram explaining the topology and various testing points.

http://img141.imageshack.us/img141/9517/20111125100mbitdiag.png

The problem we are seeing is poor throughput (sometimes asymmetric throughput).  I do understand that window size will greatly affect the throughput especially on WAN links where latency can be an issue.

To narrow down the problems we have started to perform testing between points 2 and 3 on the diagram.

Point 2 -> 3 and 3 -> 2 Testing

  • L3 RFC2544 tests all pass fine 99.99% throughput no errors or frame loss.  These tests have been performed from a JDSU SmartClass Ethernet tester. (not testing Jumbo frames)
  • L3 Throughput tests also pass fine at 97.9% (Data) from the JDSU SmartClass Ethernet Tester.
  • IPerf TCP (via Linux) and JPerf (via Windows) is where we start to see the strange issues.  This testing was completed by connecting a laptop directly into the NTE on both sides, both on the same subnet so no routing is involved.

Default 64KB Window Size Tests

Based on a 64KB (65536byte or 524288bits) window size and latency of 15ms I would have expected to get a maximum single connection throughput of 34.95Mbit/sec (524288bits / 0.015 seconds).  However the results are as follows –

2 (Client) -> 3 (Server) JPerf test = Approx 7mbit/sec

3 (Client) -> 2 (Server) JPerf test = Approx 16mbit/sec

512KB Window Size Tests

Based on a 512KB (524288byte or 4194304bit) window size and latency of 15ms I would have expected to get a maximum single connection throughput of 266Mbit/sec (4194304bits / 0.015seconds).  Obviously my link is only 100mbit so this would be the bottle neck but anywhere over 34Mbit was what I was expecting.  The results are as follows –

2 (Client) -> 3 (Server) JPerf test = Approx 7mbit/sec

3 (Client) -> 2 (Server) JPerf test = Approx 16mbit/sec

We then rebooted the NTE equipment on both sites and retested, the following results were seen.

2 (Client) -> 3 (Server) JPerf test = Approx 7mbit/sec (repeatable)

3 (Client) -> 2 (Server) JPerf test = Approx 94mbit/sec (1st test)

3 (Client) -> 2 (Server) JPerf test = Approx 32mbit/sec (2nd test)

3 (Client) -> 2 (Server) JPerf test = Approx 94mbit/sec (3rd test)

3 (Client) -> 2 (Server) JPerf test = Approx 32mbit/sec (4th test) etc… (it just kept alternating)

We then power cycled the NTE equipment again and the results went back to -

2 (Client) -> 3 (Server) JPerf test = Approx 7mbit/sec

3 (Client) -> 2 (Server) JPerf test = Approx 16mbit/sec

We have spoken to the ISP and they deny there is anything wrong, can anyone else please help me try and come up with some way to prove this (with real traffic and not RFC2544 tests)?

Any help at all would be very much appreciated!!

Thanks,

Peter

Edit: I forgot to mention multiple streams can maximize thoughput in both directions (2->3 and 3->2)

1 Accepted Solution

Accepted Solutions

From memory, it turned out there was missing routes in the providers core which meant some return traffic was being dropped and causing retransmissions (hence why the rfc2544 (UDP) test didn't show an issue but the rfc6349/jperf (TCP&UDP) did).

View solution in original post

12 Replies 12

lgijssel
Level 9
Level 9

Two issues should be investigated:

1: Packet loss on the link. This can severely affect your end-end throughput and bigger windows reduce the net throughput.

2: MTU size issues. When there is fragmentation on the link, this will also result in a serious degradation of the performance.

 

Option 1 is obviously the most likely candidate. Use

netstat -s

to check protocol parameters. Example:

TCP Statistics for IPv4




  Active Opens                        = 311

  Passive Opens                       = 121

  Failed Connection Attempts          = 9

  Reset Connections                   = 104

  Current Connections                 = 7

  Segments Received                   = 82402

  Segments Sent                       = 62862

  Segments Retransmitted              = 300


regards,

Leo

Hi,

 

Thanks for the reply.  I agree with the thoughts that there is probably packet loss but I think it may be more complicated than that.

 

When running

netstat -s

on a Test where - 

 

2 (Client) -> 3 (Server) : 2 (Client) shows approx 4000 retransmitted segments in approx 20000, 3 (Server) shows 0 retransmissions

3 (Client) -> 2 (Server) : 3 (Client) shows 0 retransmitted segments in approx 20000, 2 (Server) shows 0 retransmissions.

 

We are investigating performing an RFC6349 test now as the RFC2544 passes.

 

We also managed to get bi-directional Jperf working (allbeit once) at 100mbit, but only when using the "dual" option in Jperf.

 

Regarding the MTU size, could we prove that by doing a ping sweep (multiple sizes) with the "dont fragment" option set.  Here if we can see any drops we know that that size doesnt work or would that only be the buffer size?

 

Thanks,

Peter

I've just finished testing with a JDSU T-BERD 6000A.

RFC2544 has passed in both directions

RFC6349 has passed in both directions (it estimated 20 connections of 8KB window size to be the most effective)

RFC6349 has passed in both directions (when set to 1 connection of 512KB window size)

I've tried multiple combinations to try and reproduce the results im getting from JPerf etc MSS of 1452 and Window of 1500 etc etc but cant reproduce the issue.

Y.1564 can be run too if I can correctly work out how to set CIR/EIR and M etc.

Surely this cant be pointing to an underlying Windows/Linux issue?? (as I've tried multiple operating systems on multiple hardware)

Really am stumped right now...

Peter Lyttle wrote:

Hi,

Thanks for the reply.  I agree with the thoughts that there is probably packet loss but I think it may be more complicated than that.

When running "netstat -s" on a Test where - 

2 (Client) -> 3 (Server) : 2 (Client) shows approx 4000 retransmitted segments in approx 20000, 3 (Server) shows 0 retransmissions

3 (Client) -> 2 (Server) : 3 (Client) shows 0 retransmitted segments in approx 20000, 2 (Server) shows 0 retransmissions.

We are investigating performing an RFC6349 test now as the RFC2544 passes.

We also managed to get bi-directional Jperf working (allbeit once) at 100mbit, but only when using the "dual" option in Jperf.

Regarding the MTU size, could we prove that by doing a ping sweep (multiple sizes) with the "dont fragment" option set.  Here if we can see any drops we know that that size doesnt work or would that only be the buffer size?

Thanks,

Peter

The unidirectional high retransmit rate can be an indication of a duplex mismatch. This happens often with SP connections because many providers are still fixing the speed/duplex setting. If the customer side does not comply to this, you will have a duplex mismatch.

regards,

Leo

Hi Peter,

I agree with Leo and suggest to verify thoroughly if the entire transmission chain is full-duplex and if there are no duplex mismatches.

Once, I had an issue of an awfully slow link between two buildings. The link was optical with two standalone media converters at each end, converting the fiber ports to 100Mbps Ethernet. The throughput was outright dismal, nowhere near the 100Mbps. Only after quite a lot of going around, I noticed that the media converters (from Allied Telesyn) had a button that allowed to choose either full or half-duplex - and it was set to half. As soon as I switched the fiber link to full-duplex operation, it worked like a charm.

Please note that not only duplex mismatches but even a half duplex properly set at both ends of a link is often deleterious to the TCP throughput.

Best regards,

Peter

I've asked for clarification from the ISP on mtu settings but havent heard anything back.  Last I heard we were told that we HAD to have it on Auto/Auto and they would set the duplex/speed at their side etc.

When the JDSU testers were set to Auto/Auto they performed the RFC tests fine so I think we can rule this out as a potential problem.

Currently as it stands the JDSU tests pass all the tests RFC whereas the JPerf tests on the Laptops still have the same issue. (We've tried different OSs and hardware configurations at both sides)

Many Thanks,

Pete

Have you tried increasing the system mtu  of the 2921 switches?


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hi,

Currently im testing directly into the NTE at both sides to try and rule out as many potential issues as possible, so the only place the MTU/MSS etc should matter are on the test devices.

Thanks,

Pete

Hi,

I would check NTUs from ISP as Petr Paluch adviced on how their ports are configured in terms of speed and duplex settings. In case of doubts your ISP should provide with all necessary information about settings of those devices, if possible you should ask them to force switchports to fixed values like 100m/full duplex.

You can also try to use wireshark and try to sniff what MTU/MSS values are being negotiated over the link.

Have you stored test results from RFC2544 & RFC6349 somewhere in PDF file?

BR,

Jacek

Hi,

Thats a good shout, while I am specifying the MTU etc inside JPerf i'll see if I can confirm that with wireshark.

I have the RFC tests in PDF so that if required I can go to the ISP etc  I'd rather not post them up until I can remove private information first.

Thanks,

Pete

garylgilbert
Level 1
Level 1

What was the resolution of this problem. I am having a very similar issue. the phone company JDSU testing is clean, but i still can't transfer files over 1mb between computers or download files over 1 MB. Thank you!

From memory, it turned out there was missing routes in the providers core which meant some return traffic was being dropped and causing retransmissions (hence why the rfc2544 (UDP) test didn't show an issue but the rfc6349/jperf (TCP&UDP) did).

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card