cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3522
Views
8
Helpful
8
Replies

MSS in a MPBGP session

Dear all,

I've had some issues with the MSS of MPBGP sessions between two 7609Ss. I'm trying to understand how the PMTUD procedure works before BGP's TCP session MSS is set and that's why I'm capturing all the traffic exchanged between both routers. By default, PMTUD for BGP sessions is enabled with 'bgp transport path-mtu-discovery' and it's not globally enabled. Before TCP negotiation and subsequent BGP OPEN, I expected to see UDP packets of growing length from both sides in order to verify the path's MTU, so determine the respective MSS to be used to open the BGP's TCP session. To my surprise, I don't see this traffic and both routers choose a MSS that is the link's MTU minus 40 bytes (IP + TCP headres). Could you tell me how the PMTUD procedure works for BGP sessions?

Both routers run IOS 15.2(4)S1 and another strange issue I find is that in case PMTUD is disabled for BGP sessions 'no bgp transport path-mtu-discovery' the chosen MSS instead of 536 bytes is 576 bytes. Any idea about why this number?

Thanks in advance

Kind regards

Octavio

8 Replies 8

Rolf Fischer
Level 9
Level 9

Hi Octaivo,

in case PMTUD is disabled for BGP sessions 'no bgp transport path-mtu-discovery' the chosen MSS instead of 536 bytes is 576 bytes. Any idea about why this number?

In IPv4, the minimum datagram size every host must accept is 576 bytes, so with PMTUD disabled the router uses this value as the maximum size for the packets it sends. The MSS is typically calculated by MTU minus 40 bytes (IP- and TCP-header; the MSS is just the TCP data size).

When you enable PMTUD, the router sends packets up to the MTU size of its output-interface and the MSS can be negotiated between both end-hosts during the TCP-handshake (both just exchange their local values, the path is not checked this way).

The df-bit will be set for every packet, so if some interface in the path has a lower MTU, the associated device should send an ICMP "Destination Unreachable/Fragmentation needed and DF bit set" message back so that the router can lessen the MTU it uses for that interface/connection:

ICMP: dst (1.1.1.1) frag. needed and DF set unreachable rcv from 192.168.12.2

TCP0: ICMP datagram too big received (576), MSS changes from 1460 to 536

R1#show tcp | i Timer|segment|mtu

Event Timers (current time is 0x506CE0):

Timer          Starts    Wakeups            Next

PmtuAger            1          0        0x525928

Status Flags: active open, path mtu discovery

Option Flags: nagle, path mtu capable

Datagrams (max data segment is 536 bytes):

Then, as long as the MSS value for an interface is lower than the result of the negotiation during the TCP handshake, the router tries to discover path MTU changes dynamically:

TCP0: Pathmtu-Discovery, MSS changes from 536 to 966

R1#show tcp | i Timer|segment|mtu

Event Timers (current time is 0x52E3A8):

Timer          Starts    Wakeups            Next

PmtuAger            2          1        0x542DE8

Status Flags: active open, path mtu discovery

Option Flags: nagle, path mtu capable

Datagrams (max data segment is 966 bytes):

TCP0: Pathmtu-Discovery, MSS changes from 966 to 1452

R1#show tcp | i Timer|segment|mtu

Event Timers (current time is 0x5507B4):

Timer          Starts    Wakeups            Next

PmtuAger            3          2        0x5602A8

Status Flags: active open, path mtu discovery

Option Flags: nagle, path mtu capable

Datagrams (max data segment is 1452 bytes):

TCP0: Pathmtu-Discovery, MSS changes from 1452 to 1460

R1#show tcp | i Timer|segment|mtu

Event Timers (current time is 0x569304):

Timer          Starts    Wakeups            Next

PmtuAger            3          3             0x0

Option Flags: nagle, path mtu capable

Datagrams (max data segment is 1460 bytes):

HTH

Rolf

Thank you very much, Rolf, for your explanation. I commited a mistake in my original message. What is strange in IOS

IOS 15.2(4)S1 is that, when PMTUD is not enabled (no bgp transport path-mtu-discovery), the MSS value is not 536, as expected, but it's 556 (by mistake I wrote 576 bytes), as presented in the following output:

C7609S-01#show tcp | i segment|Option

...

Option Flags: higher precendence, nagle                 <<--- no "path mtu enabled"

Datagrams (max data segment is 556 bytes):

...

Any idea about why this number instead of the ordinary 536?

Thanks in advance

Octavio

mtsb
Level 1
Level 1

Great explanation by fischer. Here is another great article on PMTUD and MSS.

http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml

If you have different types of layer 1 connectivity end to end then we should understand really wherehow much extra overheads are added and that keeps the BGP UPDATE packet sizes/MSS limited.

Thanks,

Madhu

Thanks a lot, Madhu. Really interesting. I really appreciate your help.

Regards

Octavio

Hi Octavio,

Can you send the sh run of the interfaces/show ip interface on both sides and how they are connected? For example, for a back to back connected interfaces even when you disable the PMTUD it will be still interface mtu - 40 bytes. You can use the "neighbor a.b.c.d transport path-mtu-discovery disable" instead of doing it gloablly under BGP that will affect all neighbors. For some reason it is accounting 20 byte more than expected. If possible "debug ip tcp transaction" when this is happening?

Thanks,

Madhu

Thank you, Madhu. I just wanted to clarify why 556bytes and I didn't provide you with the whole information. I will try to do it now. In fact, I get 556 bytes as MSS without disabling PMTUD.

I have two Cisco 7609S (R1 runs 15.2(4)S1 and R2 runs 15.3(3)S1) running iBGP between them, apart from OSPF as IGP, and connected by a STM16/OC48 SONET link. Apart from that, they have several iBGP and eBGP sessions with different peers. They all have PMTUD enabled by default. My suspicion is that 15.2(4)S1 has some sort of bug. If we take a look at the MSSs you find something strange:

R2 BGP sessions:


Peer    MSS    Flag options

XXXXXX    1460    nagle, path mtu capable

XXXXXX    1918    nagle, path mtu capable

XXXXXX    1460    nagle, path mtu capable <-- IPX

R1    4406    nagle, path mtu capable

XXXXXX    1460    nagle, path mtu capable

XXXXXX    1460    nagle, path mtu capable

XXXXXX    4406    nagle, path mtu capable

R1 BGP sessions:

Peer    MSS    Flag options

XXXXXX    556    nagle

XXXXXX    556    nagle

R2    556    nagle

XXXXXX 1460    higher precendence, nagle, path mtu capable

XXXXXX    556    higher precendence, nagle

XXXXXX    556    higher precendence, nagle

XXXXXX    556    higher precendence, nagle

XXXXXX    556    VRF id set, higher precendence, nagle

XXXXXX    1460    higher precendence, nagle, path mtu capable, md5

XXXXXX    556    higher precendence, nagle

As you can see, R2 seems to be able to run PMTUD. However, R1, even though PMTUD is enabled, only two of them get a different value from 556 bytes of MSS and most of them doesn't show "path mtu capable".

If I focus on R1-R2's iBGP session over the STM4 link, R2 has a 4406 bytes MSS and R1 556 bytes.

R1 outputs:

interface POS4/0/0

ip address XXXXXX 255.255.255.254

load-interval 30

mpls traffic-eng tunnels

mpls traffic-eng attribute-flags 0xC0000008

mls qos trust dscp

pos framing sdh

pos report rdool

pos report lais

pos report lrdi

pos report pais

pos report prdi

pos report puneq

pos report pplm

pos report ptim

pos report ptiu

pos report sd-ber

pos flag s1 ignore

service-policy input Policy_NetworkIngress

ip rsvp bandwidth percent 100

end

R1#sh ip interface POS4/0/0

POS4/0/0 is up, line protocol is up

  Internet address is XXXXXX/31

  Broadcast address is 255.255.255.255

  Address determined by non-volatile memory

  MTU is 4470 bytes

  Helper address is not set

  Directed broadcast forwarding is disabled

  Multicast reserved groups joined: 224.0.0.14 224.0.0.17 224.0.0.5

  Outgoing access list is not set

  Inbound  access list is not set

  Proxy ARP is enabled

  Local Proxy ARP is disabled

  Security level is default

  Split horizon is enabled

  ICMP redirects are always sent

  ICMP unreachables are always sent

  ICMP mask replies are never sent

  IP fast switching is enabled

  IP Flow switching is disabled

  IP CEF switching is enabled

  IP CEF switching turbo vector

  IP Null turbo vector

  Associated unicast routing topologies:

        Topology "base", operation state is UP

  IP multicast fast switching is enabled

  IP multicast distributed fast switching is disabled

  IP route-cache flags are Fast, CEF, CWAN

  Router Discovery is disabled

  IP output packet accounting is disabled

  IP access violation accounting is disabled

  TCP/IP header compression is disabled

  RTP/IP header compression is disabled

  Probe proxy name replies are disabled

  Policy routing is disabled

  Network address translation is disabled

  BGP Policy Mapping is disabled

  Input features: QoS Classification, MCI Check

  Output features: HW Shortcut Installation

  Post encapsulation features: HW Shortcut Installation

  Sampled Netflow is disabled

  IP Routed Flow creation is disabled in netflow table

  IP Bridged Flow creation is disabled in netflow table

  WCCP Redirect outbound is disabled

  WCCP Redirect inbound is disabled

  WCCP Redirect exclude is disabled

R2 outputs:

interface POS4/0/0

ip address XXXXXXX 255.255.255.254

load-interval 30

mpls traffic-eng tunnels

mpls traffic-eng attribute-flags 0xC0000008

mls qos trust dscp

pos framing sdh

pos report rdool

pos report lais

pos report lrdi

pos report pais

pos report prdi

pos report puneq

pos report pplm

pos report ptim

pos report ptiu

pos report sd-ber

pos flag s1 ignore

aps group 4

aps protect 1 XXXXXXXX

aps revert 10

service-policy input Policy_NetworkIngress

ip rsvp bandwidth percent 100

end

C7600-BAT01-01#sh ip interface pos4/0/0

POS4/0/0 is up, line protocol is down (APS protect - inactive)

  Internet address is XXXXXX/31

  Broadcast address is 255.255.255.255

  Address determined by non-volatile memory

  MTU is 4470 bytes

  Helper address is not set

  Directed broadcast forwarding is disabled

  Multicast reserved groups joined: 224.0.0.14 224.0.0.17 224.0.0.5

  Outgoing access list is not set

  Inbound  access list is not set

  Proxy ARP is enabled

  Local Proxy ARP is disabled

  Security level is default

  Split horizon is enabled

  ICMP redirects are always sent

  ICMP unreachables are always sent

  ICMP mask replies are never sent

  IP fast switching is enabled

  IP Flow switching is disabled

  IP CEF switching is enabled

  IP CEF switching turbo vector

  IP Null turbo vector

  Associated unicast routing topologies:

        Topology "base", operation state is UP

  IP multicast fast switching is enabled

  IP multicast distributed fast switching is disabled

  IP route-cache flags are Fast, CEF, CWAN

  Router Discovery is disabled

  IP output packet accounting is disabled

  IP access violation accounting is disabled

  TCP/IP header compression is disabled

  RTP/IP header compression is disabled

  Probe proxy name replies are disabled

  Policy routing is disabled

  Network address translation is disabled

  BGP Policy Mapping is disabled

  Input features: QoS Classification, MCI Check

  Output features: HW Shortcut Installation

  Post encapsulation features: HW Shortcut Installation

  Sampled Netflow is disabled

  IP Routed Flow creation is disabled in netflow table

  IP Bridged Flow creation is disabled in netflow table

  IPv4 WCCP Redirect outbound is disabled

  IPv4 WCCP Redirect inbound is disabled

  IPv4 WCCP Redirect exclude is disabled

Thanks a lot for your help.

Kind regards

Octavio

Hi Octavio,

THe MSS value can be different in both directions if there is asymmetrical routing. If you enable "debug ip tcp transaction" and clear bgp between these 2 peers it is getting a differnet mss value? With ethernet link and directly connected neighbors I doubt if the pmtud will kick in as it will take mtu of interface as basis for mss calc. Not sure if it is same for any underlying link or bit different for serial links. Enabling the above debugs will clarify if R1 is sending its capability to negotiate or not.

Thanks,

Madhu

Dear Madhu,

Sorry for my late response. Once again, thanks a lot for your help. You will see I found something strange.

Please, remember my scenario:

I have two Cisco 7609S (R1 runs 15.2(4)S1 and R2 runs 15.3(3)S1) running iBGP between them, apart from OSPF as IGP, and connected by a STM16/OC48 SONET link.

Additionally, we set "ip tcp mss 1460" in R2.

I was only allowed to reset a MPBGP session between R1 and R2 that runs over a GRE tunnel between both routers.

As expected, both routers ended up with MSS of 1460 bytes (R2 advertises 1460 to R1 and sets 1460 as local MSS due to "ip tcp mss 1460", as it's smaller than the 4406 bytes MSS advertised by R1). Output taken from R2:

006421: Jan 13 09:42:38.069: TCB232B59D0 setting property TCP_SSO_TYPE (27) 3A413E70

006422: Jan 13 09:42:38.069: TCP: SSO already disabled for 232B59D0

006423: Jan 13 09:42:38.069: TCB232B59D0 setting property TCP_SSO_TYPE (27) 22C46A78

006424: Jan 13 09:42:38.069: TCP: SSO already disabled for 232B59D0

006425: Jan 13 09:42:38.069: TCP0: state was ESTAB -> FINWAIT1 [179 -> 10.188.188.1(64695)]

006426: Jan 13 09:42:38.069: TCP0: sending FIN

006427: Jan 13 09:42:38: %BGP-5-ADJCHANGE: neighbor 10.188.188.1 vpn vrf Mgmt Down User reset

006428: Jan 13 09:42:38: %BGP_SESSION-5-ADJCHANGE: neighbor 10.188.188.1 IPv4 Unicast vpn vrf Mgmt topology base removed from session  User reset

006429: Jan 13 09:42:38.189: TCP0: state was FINWAIT1 -> FINWAIT2 [179 -> 10.188.188.1(64695)]

006430: Jan 13 09:42:38.193: TCP0: FIN processed

006431: Jan 13 09:42:38.193: TCP0: state was FINWAIT2 -> TIMEWAIT [179 -> 10.188.188.1(64695)]

006432: Jan 13 09:42:38.397: TCB318670D0 created

006433: Jan 13 09:42:38.397: TCB318670D0 setting property TCP_VRFTABLEID (20) 227AE0A4

006434: Jan 13 09:42:38.397: TCB318670D0 setting property TCP_MD5KEY (4) 0

006435: Jan 13 09:42:38.397: TCB318670D0 setting property TCP_ACK_RATE (37) 3A5F614C

006436: Jan 13 09:42:38.397: TCB318670D0 setting property TCP_TOS (11) 3A5F6150

006437: Jan 13 09:42:38.397: TCB318670D0 setting property TCP_PMTU (45) 3A5F6110

006438: Jan 13 09:42:38.397: TCB318670D0 setting property TCP_RTRANSTMO (36) 3A5F6148

006439: Jan 13 09:42:38.397: TCP: Random local port generated 38824, network 1

006440: Jan 13 09:42:38.397: TCB318670D0 bound to 10.188.188.2.38824

006441: Jan 13 09:42:38.397: Reserved port 38824 in Transport Port Agent for TCP IP type 1

006442: Jan 13 09:42:38.397: TCP: sending SYN, seq 1477543876, ack 0

006443: Jan 13 09:42:38.397: TCP0: Connection to 10.188.188.1:179, advertising MSS 1460

006444: Jan 13 09:42:38.397: TCP0: state was CLOSED -> SYNSENT [38824 -> 10.188.188.1(179)]

006445: Jan 13 09:42:38.521: TCP0: state was SYNSENT -> ESTAB [38824 -> 10.188.188.1(179)]

006446: Jan 13 09:42:38.521: TCP: tcb 318670D0 connection to 10.188.188.1:179, peer MSS 1460, MSS is 1460

006447: Jan 13 09:42:38.521: TCB318670D0 connected to 10.188.188.1.179

006448: Jan 13 09:42:38.521: TCB318670D0 setting property TCP_NO_DELAY (0) 3A5F6148

006449: Jan 13 09:42:38.521: TCB318670D0 setting property TCP_RTRANSTMO (36) 3A5F6148

006450: Jan 13 09:42:38: %BGP-5-ADJCHANGE: neighbor 10.188.188.1 vpn vrf Mgmt Up

So far so good.

However two days afterwards MSS session's value in R1 changed from (Peer MSS Flag-options) "10.188.188.2 1460    higher precendence, nagle, path mtu capable" to "10.188.188.2 556    higher precendence, nagle". BGP session wasn't restarted  and R2 kept MSS values from last restart. It's as if "path mtu capable" capability was disabled and sessions MSS was changed to 556 bytes.

To be honest, I'm unable to find any other explanation than a bug in 15.2(4)S1.

Thanks a lot in advance

Kind regards

Octavio