I am running a number of sites with 4 circuits between them using load balance by packet.
They work well as far as balance goes but I have had issue with file transfer throughput being very low. During testing I can actually get better throughput by shutting down 2 of the circuits.
There are no errors anyplace and all circuits are exactly the same as far as speed and latency.
After getting a packet capture from both client machines there are a huge number of packet retransmissions. There is no actual loss of any packet in the capture so these are not really valid retransmissions.
From what I can tell it is due to the packets arriving out of order. This causes many acks to be sent for the same seq number which the far end decides is loss and retransmits. From reading the RFC it appears that 3 is the magic number to cause a retransmission.
I know I can use multilink ppp to solve the out of order issue but it increases the knowledge level to support. I have a few NOC guys that will not understand that circuits can be down but still appear to ping.
Anyone suggestions. I have been looking to see if there is any parm in the tcp stack that would affect this but this appears to be very fundamental to tcp.
It does sound like it is due to packets arriving out of order. Normally the TCP implementation should be able to handle this and reorder the segment. However, as you have observed, there are some implementations that simple don't do that very well.
It sounds like you have considerable jitter between the various paths. Maybe one lf the paths is congested and the others are not. That could really cause a lot of out-of-order delivery.
If your paths are multihop, the problem could be made worse because the per-packet load balancing knows only about the local router - it does not know what is two hops down the line.
However, you do say that you are considering ppp multilink, so I guess you are just one hop away (unless you are multilinking down tunnels of course).
I guess it is going to depend on window size, segment size, and latency jitter. I sounds like you may actually be better off with per-destination.
I wish I could use per session load balancing but the main reason many of these sites were upgraded was there was a new requirement to transfer large amounts of data . The user would be limited to a single circuit if I did that since all the traffic goes between 2 machines.
They will never be able to use over 2 links of bandwidth because of the latency and their use of CIFS to transfer the files. At least if I get this fixed I can say its not the network where as now it is truly a network issue causing their delays.
The limitation for CIFS transfer speed isn't usually CIFS so much as the default TCP receive window size within XP and earlier. You can modify the default in the registry. Or, if you running on a host connection of 100 Mbps, move to gig. (Windows increases the default for gig connections.)
Vista TCP stacks advertise a large TCP receive window. We've seen them pull data 3x faster, running across high BDP links, than a XP client, if both are running registry defaults.
Adding a comment:
As you know that TCP applications requires speciall treatment.
This is a typical behaviour of TCP synchronization, Window Size increases and gets interrupted at a point , drops back and restart synchronizing.
I suspect that because per-packet load-balanc affect some TCP sessions, so its restart syn with its peer thus resulting in the transmissoin behaviour you 've seen.
With per-packet, router spread packets over links having same routing metric, in a strict round-robin fashion. Due to different size of packets, and different queue conditions of the individual links, packets are easily sent out of sequence, with the consequences that are described in this thread.
With per-destination or CEF load balancing, router actually computes "flows" or "sessions" based on IP addresses in source and destination, and associate these to links, for the duration of a caching period. This way packet arrival order is preserved and everyone is happy.
The default is per-destination, and for good reasons.
Multilink PPP is definitely the best way to solve this issue. I seriously don't think it should increase the knowledge level to support all that much.
I'd recommend getting a good network management piece of software, such as Solarwinds Orion, that will allow you to put all your physical T1's in, as well as your multilink bundle. That way they can see exactly what goes down very easily. Just put in the management IP of the router during configuration and it tells you if a circuit goes down on that router. You don't have to bother with pinging each interface.
Works very well for us in these situations.
Could you clarify "But out-of-order arrival doesn't cause duplicate acks, it causes window resets." in reference to RFC2001 and RFC2581 discussing generation of duplicate ACKs when dealing with out-of-order TCP packets? Perhaps you have in mind for a window reset, Fast Recovery, which reduces the send window, reducing the transmission rate, initiated by duplicate ACKs?
I agree with recommended solutions that avoid the re-ordering issue, but re-ordering really isn't a broken network, since IP doesn't guarantee sequencing. TCP is designed to deal with re-ordering, but its default Fast Retransmit/Fast Recovery dup ACK count assumes expected bounds on how severe the out of sequence condition normally might be. In this case, likely the normal expectations are being exceeded.
The real risk of changing the default settings is they tend to be global. I.e., increasing the value will likely make TCP too lax for other "normal" flows. That noted, if one is looking for a possible short term solution, and understanding the impact, the value might be increased an increment or two. Longer term, the network can be changed.
Just did a bit more digging. The duplicate ACKs you have in mind is the receiver is using them to advertise a change in its receive window size?