Solved: BDP, TCP RWIN and optimal data flow

Orkhan Gasimov · ‎02-22-2017

Hi everyone!

Can anybody please explan Bandwidth Delay Product (BDP) and its relation to optimal TCP RWIN value?
It's 2 questions actually.

1. If BDP is the amount of data that can be in-transit on a link at any time, why does it say here:
http://www.cisco.com/c/en/us/td/docs/nsite/enterprise/wan/wan_optimization/wan_opt_sg/chap06.html
that BDP = Bandwidth x RTT (assuming RTT is two-way latency),
while it says here:
http://packetlife.net/blog/2010/aug/4/tcp-windows-and-window-scaling/
that BDP = Bandwidth x Delay (assuming Delay is one-way latency)?

The second explanation seems more logical to me. If a device can send at 155 Mbps and one-way latency on a link is
0,1 sec, it seems more reasonable to perform a calculation of 155 Mbps x 0.1 sec = 15.5 Mb (or 1.93 MB) to find the
amount of data that can be on the link IN ONE DIRECTION at any time. And shouldn't we be interested in ONE
DIRECTION only if we aim to somehow connect this BDP value to TCP RWIN? (TCP RWIN specifies how many bytes a device
is willing to receive before it sends an acknowledgement; it does not specify anything about how many bytes a
device is willing to send, i.e. it's a parameter related to one direction only, not both directions.)

2. No matter whether an author suggests to calculate BDP as Bandwidth x RTT or Bandwidth x Delay, both authors say
that an ideal TCP RWIN value should be Bandwidth x RTT (or 2 x Bandwidth x Delay). Why is it the case?

Let's assume that two devices (A and B) operating at 80Mbps (or 10MBPS) are connected with a link having one-way
latency of 0,1 sec (and both directions have equal latency of 0,1 sec). Let's also assume that TCP RWIN is 50 kB or
0,05 MB. Let's calculate BDP in one direction: 10 MBPS x 0,1 sec = 1 MB (the amount of data that can be on the link at any time).
A wants to send traffic to B. B advertises an RWIN of 0.05 MB. As A can send at 10 MBPS, it can send 0,05 MB in
0,005 sec, so the following happens (time below is A's local time):
- 0,005 sec: A sends 0,05 MB;
- 0,105 sec: traffic reaches B;
- 0,105 sec: B sends an acknowledgement (let's omit some processing time);
- 0,205 sec: the acknowledgement reaches A;
- 0,205 sec: A sends next 0,05 MB...
This way, after 5 repetitions, A will have sent 0,25 MB to B in 1.025 sec, or ~ 1,5 MB in 6 seconds.
Let's now calculate the BDP as Bandwidth x RTT: 10 MBPS x 0,2 sec = 2 MB. If an ideal TCP RWIN is 2 MB, the
calculation above would go as follows (A would be able to send 2 MB in 0,2 sec as it can send 10 MB in 1 sec):
- 0,2 sec: A sends 2 MB;
- 0,3 sec: traffic reaches B;
- 0,3 sec: B sends an acknowledgement;
- 0,4 sec: the acknowledgement reaches A;
- 0,4 sec: A sends next 2 MB...
This way, after 5 repetitions, A will have sent 10 MB to B in 2 sec, or ~ 30 MB in 6 seconds. Much better, but why is it ideal? Why cannot TCP RWIN be even more? Or am I missing something obvious?

Thank you very much in advance!

Joseph W. Doherty · ‎02-23-2017

I understand it this way: B sends an acknowledgement only after A finishes sending 22 segments.

No, TCP doesn't work that way.

Yes, the PacketLife statement is correct. TCP must pause its transmission when there's no longer sufficient RWIN space for the next segment. ACKs provide information about available RWIN space, but TCP doesn't wait until RWIN is filled to send an ACK, again, it normally ACKs every 2nd packet.

The ACKs provide on-going updates to the current RWIN value. As packets are received, RWIN would decrease. As the receiving application accepts the packets, RWIN is increased. So, while the packets are in-flight, the next ACK RWIN might be the same value as the last, or smaller or larger.

Where this can be a bit confusing, the sender cannot rely on just the ACK's RWIN because of in-flight packets.

BTW, the sender also uses the stream of returning ACKs to keep updating its RTT value. It uses this to adjust its timer for a lost ACKs and some newer TCP implementations also use it to detect and respond to what might be congestion before there are drops.

View solution in original post

Joseph W. Doherty · ‎02-22-2017

The reason you use greater than half the RTT, because you don't want to stop transmitting when you "fill the pipe" toward the destination. Don't forget the time it takes for the ACKs to come back.

Consider what could happen if RWIN used half the RTT. Once the sender "filled the pipe" it would stop transmitting until it got an ACK. Consider the first ACK couldn't be transmitted until after the first two packets are received. So, the sender would pause its transmission. Further consider, the receiver ACKs the received packets, but they haven't be processed by the receiving application. I.e. the receiver's RWIN hasn't been opened. Worst case, the sender is unable to keep the pipe filled because the RWIN is only one "fill the pipe" volume.

Using RTT for WIN provides time for the receiver's TCP application to process data while allowing the pipe can be kept filled.

Once you get the "pipe filled", and ACK's are returned indicating the RWIN is not being filled, the sender effectively "self clocks", it sends additional packets with receipt of ACKs. I.e. an even larger RWIN wouldn't improve the possible transmission rate.

Orkhan Gasimov · ‎02-22-2017

Dear Joseph,

Thank you very much for the contribution.

1. Could you please clarify the sentence "Don't forget the time it takes for the ACKs to come back."? How does it relate to filling the pipe? No matter whether I keep the pipe fully loaded or not, the ACK cannot come back before I send the whole data equal to TCP RWIN. So the benefit with bigger TCP RWIN is not that it helps an ACK to come back in the same time as I am still sending data (so that we do not waste extra time), it just makes the ACKs be sent less frequently so that less time is wasted for them in total. But the pipe being full is the secondary effect, not the primary one, as it seems to me. I might be wrong though, it depends on the answer to the second question below.

2. Could you please clarify the sentence "Consider the first ACK couldn't be transmitted until after the first two packets are received."? Aren't ACKs thansmitted only after the data equal to the whole TCP RWIN is sent? If not, what is the windowing concept itself?

Thank you.

Joseph W. Doherty · ‎02-23-2017

ACKs are sent upon receiving two TCP packets or when the ACK timer expires. (The latter deals with "odd" numbered TCP packets.)

Regarding the windowing concept, recall there are two windows. One the RWIN value, the second the congestion window. The sender use the lessor of the two. Both are dynamic values.

Orkhan Gasimov · ‎02-23-2017

Thank you again, but it's still not clear to me. The confusing moment is here:

http://packetlife.net/blog/2010/aug/4/tcp-windows-and-window-scaling/

where it says:

Host A needs to send data to host B. It can tell from host B's advertised window size that it can transmit up to 32,768 bytes of data (in intervals of the maximum segment size, or MSS) before it must pause and wait for an acknowledgment. Assuming an MSS of 1460 bytes, host A can transmit 22 segments before exhausting host B's receive window.

I understand it this way: B sends an acknowledgement only after A finishes sending 22 segments. However, your post says (or I understand it this way) that B sends an acknowledgement as soon as it gets first 2 segments. Am I mistaken?

(If you meant just the initial SYN - SYN,ACK - ACK, then it's OK, but it's just the connection setup; what about the data flow that follows? My original post was about the data flow, not the initial connection setup.)