XR VRF BGP RST - ?

Garry Peirce · ‎01-16-2020

Curious if anyone might have some insight on a BGP session setup problem I'm seeing.

I'm having trouble bringing up a BGP peering session between IOS-XR VRF and IOS.

[ IOSXR_PeerA (VRF)-----IOSXR(Global) ]---(internal network, 3 hops)---IOS_PeerB

PeerA is an XR host running 6.5.3 with other working (global table) BGP peerings.

This new peer is within a VRF. It's router-ID and update-source = address 'PeerA' and is within the VRF.

PeerB is an IOS router (15.8(3)). It's router-ID and update-source = address 'PeerB'.

I can successfully ping and SSH to PeerB from within PeerA's VRF using PeerA's addr as the source.

I can successfully ping and SSH to PeerA from PeerB using PeerB's addr as the source.

There seems no problem with TCP in general between the peers.

However, the BGP session will not come up.

I've packet captured at the IOS end and the sequence below repeats.

The IOS side initiates with a SYN and IOSXR immediately responds with an RST.

In the trace, all packet addressing expected is correct, matching the peer address configurations.

Normal TCP options in SYN, window size=0, MSS=1240

It's not getting to BGP nego stage, XR is just sending RSTs immediately.

The PeerB side basically sees this from TCP perspective:

Reserved port 43637 in Transport Port Agent for TCP IP type 1
TCB293A4544 getting property TCP_STRICT_ADDR_BIND (19)
TCP: pmtu enabled,mss is now set to 1460
TCP0: Connection to [PeerA]:179, advertising MSS 1460
TCP0: state was CLOSED -> SYNSENT [43637 -> [PeerA](179)]
Released port 43637 in Transport Port Agent for TCP IP type 1 delay 240000
TCP0: state was SYNSENT -> CLOSED [43637 -> [PeerA](179)]
TCP0: bad seg from [PeerA]-- closing connection: port 43637 seq 0 ack 273838304 rcvnxt 0 rcvwnd 0 len 0
TCP0: connection closed - remote sent RST
TCB 0x293A4544 destroyed

I can't seem to find XR complaining about much other than this which appears to re-occur each attempt.

"show tcp" => 0x00007f9138013db8 0x60000003 0 0 [PeerA]:30751 ,[PeerB]:179 SYNSENT

"show tcp packet-trace pcb" => show nothing of apparent error

"show bgp trace" =>

default-bgp/spkr-tr2-issu 0/RSP1/CPU0 t11907 [ISSU]:978: Calling bgp_sock_error with reason 1 for nbr [PeerB]

The only reason I've found for a BGP TCP RST is either the routerID is not set or the expected peer addr does not match, but they're correct here. Curious if anyone may have any thoughts, may have seen similar behavior before, or perhaps knows what XR's "reason 1" means here ;-)

TIA,

decode.chr13 · ‎01-16-2020

Hello,

Can you telnet PeerA -> PeerB:179?

Can you telnet PeerB -> PeerA:179?

Garry Peirce · ‎01-16-2020

ok, good thought. After opening up telnet on vtys to test :

Note: anonymized IPs and ASNs.

Telnet attempt from A-> B seems to work. (A = XR , B = IOS)

Trying 10.0.0.13...
Use specified source interface(Loopback99).
Use 10.0.0.10 as local address.
Connected to 10.0.0.13.
Escape sequence is '^^q'.

Telnet attempt from B->A

Trying 10.0.0.10, 179 ...
% Connection refused by remote host

And for added reference:

Peer A:

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.0.10 4 xxx 0 0 1 0 0 never Idle

Peer B

Neighbor Spk AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down St/PfxRcd
10.0.0.13 0 xxx 0 0 0 0 0 00:00:00 Active

tkarnani · ‎01-16-2020

Hello,

can we check

show tcp brief | inc 179 << are we listening?

show bgp vrf XXX neighbor xxxx detail << please check, correct src interface, router-id, any error present

show bgp vrf XXX neighbor xxxx decoded-message-log <<< rfc 1771 sec 4.5 has the code values

the above output should give us a clue

Thanks

Garry Peirce · ‎01-16-2020

yes, I was just looking around on XR TCP sockets..

I see :

RP/0/RSP1/CPU0:#sh tcp br | inc :179.*LISTEN
Thu Jan 16 14:46:21.654 EST
0x00007f9140028f18 0x60000000 0 0 :::179 :::0 LISTEN
0x00007f912c0107c8 0x60000003 0 0 :::179 :::0 LISTEN
0x00007f9140024088 0x00000000 0 0 :::179 :::0 LISTEN
0x00007f912401c3e8 0x60000000 0 0 0.0.0.0:179 0.0.0.0:0 LISTEN
0x00007f91380134b8 0x60000003 0 0 0.0.0.0:179 0.0.0.0:0 LISTEN
0x00007f9124015518 0x00000000 0 0 0.0.0.0:179 0.0.0.0:0 LISTEN

The one in bold seems to contain all the other active BGP IPv4 peers (the 2nd two v4 PCBs have none).

Interestingly the peer A address of interest (10.0.0.10) is not listed in any.

Must be XR's VRF operation/config resulting in it not listening here.

I'll answer @tkarnani's command output in next reply...

Garry Peirce · ‎01-16-2020

RP/0/RSP1/CPU0:#show bgp vrf xxx neighbor 10.0.0.13 detail | inc ID
Thu Jan 16 14:55:52.186 EST
Remote router ID 0.0.0.0 ** this would not be known until after 3-way handshake though , right?

....

Local host: 10.0.0.10, Local port: 0, IF Handle: 0x00000000
Foreign host: 10.0.0.13, Foreign port: 0

RP/0/RSP1/CPU0:#show bgp vrf xxx neighbor 10.0.0.13 decoded-message-log
<results in no output>
RP/0/RSP1/CPU0:#

tkarnani · ‎01-16-2020

based on the output you have provided the xr device does not seem to even start a session or receive anything from the other side.

nothing silly like ttl? (ebgp or ttl security?)

can we ping in both directions fine using the source ip's?

i can take a look on webex tomorrow, send me an email, my username @ cisco.com

Garry Peirce · ‎01-17-2020

Spidey sense thinking this issue is XR LPTS related.

RP/0/RSP1/CPU0:#show lpts pifib entry brief location 0/0/CPU0 | in 10.0.0.10 *All looks correct ; at least system aware
Fri Jan 17 09:09:44.963 EST
IPv4 <VRFname> TCP any 0/RSP1/CPU0 10.0.0.10,47333 10.0.0.13,179
IPv4 <VRFname> TCP any 0/RSP1/CPU0 10.0.0.10,179 10.0.0.13

RP/0/RSP1/CPU0:#show lpts pifib hardware police location 0/0/CPU0 | in BGP *Showing 0 drops for anything
Fri Jan 17 09:09:09.857 EST
BGP-known 6 Static 2500 2500 36999800 0 01234567
BGP-cfg-peer 7 Static 2000 2000 1578 0 01234567
BGP-default 8 Static 1500 1500 650142 0 01234567

HAH! via 'debug lpts packet slow-path drops' - now why? I've no idea..

RP/0/RSP1/CPU0:#RP/0/RSP1/CPU0:Jan 17 09:34:49 : netio[131]: lpts ifib [0x63c617f8/60 if 0x06000140 IP4 10.0.0.13 -> 10.0.0.10 TCP frag] to local stack for reject (LT BGP4_FM), dropping

decode.chr13 · ‎01-16-2020

If telnet doesn't work from B->A, then you must have an access-list/firewall on the way.
I suppose the XR interface doesn't have a access-list on it....

decode.chr13 · ‎01-17-2020

Did you solve this?

Garry Peirce · ‎01-17-2020

It's still an open issue and to answer another early question here, no, there are no FW nor ACLs in the path.

tkarnani · ‎01-17-2020

Hi Garry,

Thanks for your time on webex.

here is the fun part, we are route leaking on the asr9k, bgp vrf to global and from global to vrf.

the ip address for the vrf exists only on a loopback, no physical ports. we have no mpls just ipv4.

from the debug you provided in the earlier post, we see the packets arrive in the global table and get dropped in lpts.

we need to figure out a way to have these packets be in the vrf. either hit an interface in the vrf, possibly ABF could do this? not 100% sure. something that needs to be tested. "ABF VRF select"

https://community.cisco.com/t5/service-providers-documents/asr9000-xr-abf-acl-based-forwarding/ta-p/3153403

whenever you open a tac case i can take ownership and work with you on this

thanks!

tkarnani · ‎02-08-2020

Hi Team,

just to close on this.

PE1 <> P <> PE2

PE2 is trying to peer its global loopback with a loopback on PE1 in VRF A.

the physical links were are in global space, not vrf. the challenge was the underlay was pure ipv4.

bgp packets reaching PE1, but since it was received in the global space, it was being dropped as it was

received on global table, but not placed in vrf.

originally i thought ABF would help move these packets between global/vrf, however ABF does not touch locally processed packets, just through traffic.

there are really 3 options to fix this.

1. enable l3vpn, this would assign an mpls label per packet, router would know based on labels to move this to vrf.

2. carve out a vrf sub interface all the way through

3. the most easiest, build a gre tunnel between the routers and on PE1 place it in vrf.