I am running a number of sites with 4 circuits between them using load balance by packet.
They work well as far as balance goes but I have had issue with file transfer throughput being very low. During testing I can actually get better throughput by shutting down 2 of the circuits.
There are no errors anyplace and all circuits are exactly the same as far as speed and latency.
After getting a packet capture from both client machines there are a huge number of packet retransmissions. There is no actual loss of any packet in the capture so these are not really valid retransmissions.
From what I can tell it is due to the packets arriving out of order. This causes many acks to be sent for the same seq number which the far end decides is loss and retransmits. From reading the RFC it appears that 3 is the magic number to cause a retransmission.
I know I can use multilink ppp to solve the out of order issue but it increases the knowledge level to support. I have a few NOC guys that will not understand that circuits can be down but still appear to ping.
Anyone suggestions. I have been looking to see if there is any parm in the tcp stack that would affect this but this appears to be very fundamental to tcp.
It does sound like it is due to packets arriving out of order. Normally the TCP implementation should be able to handle this and reorder the segment. However, as you have observed, there are some implementations that simple don't do that very well.
It sounds like you have considerable jitter between the various paths. Maybe one lf the paths is congested and the others are not. That could really cause a lot of out-of-order delivery.
If your paths are multihop, the problem could be made worse because the per-packet load balancing knows only about the local router - it does not know what is two hops down the line.
However, you do say that you are considering ppp multilink, so I guess you are just one hop away (unless you are multilinking down tunnels of course).
I guess it is going to depend on window size, segment size, and latency jitter. I sounds like you may actually be better off with per-destination.
I wish I could use per session load balancing but the main reason many of these sites were upgraded was there was a new requirement to transfer large amounts of data . The user would be limited to a single circuit if I did that since all the traffic goes between 2 machines.
They will never be able to use over 2 links of bandwidth because of the latency and their use of CIFS to transfer the files. At least if I get this fixed I can say its not the network where as now it is truly a network issue causing their delays.
The limitation for CIFS transfer speed isn't usually CIFS so much as the default TCP receive window size within XP and earlier. You can modify the default in the registry. Or, if you running on a host connection of 100 Mbps, move to gig. (Windows increases the default for gig connections.)
Vista TCP stacks advertise a large TCP receive window. We've seen them pull data 3x faster, running across high BDP links, than a XP client, if both are running registry defaults.