Multihop BGP - Odd behaviour

PatrickAndersen · ‎07-21-2022

Hi,

I'm experiencing something I want to categorize as odd behaviour, though I'm not sure how to find out what exactly is happening.

I've attached a picture illustrating two solutions, A and B. A does not work, B does work.

A quick summary of the setup: BR01 is a border router (ASR9903), part of an MPLS network along with AR01, an access router (NCS540). CE01 is the customers own equipment(ISR1101) and is not part of the MPLS network and is simply connected to the Internet VRF via a /30 subnet. EBGP session is running from BR01 -> AR01 via a L3VPN VRF (Internet) and then further to CE01. BR01 contains a full internet route table. AR01 only contains a default route (0.0.0.0) towards BR01 and its directly connected subnets, it will never know of 20.0.0.0/24 specifically.

Solution A:

Multihop ebgp session is established between BR01(loopback interface 3.3.3.3) and CE01(directly connected interface). CE01 is advertising 20.0.0.0/24 to BR01 without issues, and BR01 accepts the route and installs the route in its route-table with next-hop 1.1.1.2.

Ping from BR01 to CE01 (20.0.0.1) fails with TTL expired. Traceroute from BR01 to CE01 (20.0.0.1) shows that traffic is sent to AR01 however, AR01 forwards the traffic back to BR01 via its default route (presumably because it tries to lookup 20.0.0.0/24 instead of forwarding traffic to 1.1.1.2 before the lookup for 20.0.0.0/24 is performed). This is what strikes me as odd behaviour. BR01 states that next-hop for 20.0.0.0/24 is 1.1.1.2 via AR01. AR01 receives traffic and instead of continously forwarding traffic to 1.1.1.2, it decides to perform its own lookup for 20.0.0.0/24. (I vaguely think it's because AR01 considers 1.1.1.2 part of its directly connected network, and thus this behaviour occurs, but I can't figure out if this is truely intended behaviour and haven't been able to spot anything on it).

Solution B:

Multihop ebgp session is established between BR01(loopback interface 3.3.3.3) and CE01(loopback interface 2.2.2.2). CE01 is advertising 20.0.0.0/24 to BR01 without issues, and BR01 accepts the route and installs the route in its route-table with next-hop 2.2.2.2.

Ping from BR01 to CE01 (20.0.0.1) works. Traceroute from BR01 to CE01 (20.0.0.1) shows that traffic is sent to AR01, and then forwarded further to CE01 as expected. BR01 performs a lookup on 20.0.0.0/24 and sees next-hop as 2.2.2.2 via AR01. AR01 receives the traffic and performs a lookup for 2.2.2.2 and forwards to CE01 via the static route (similar to what I would expect of solution A, albeit without the static route, since it's known as directly connected).

I initially suspected a bug on Cisco NCS540 due to previous experiences with the model however, if I switch the AR01 with a Cisco ASR920, same behaviour occurs, which made me wonder if it's "normal" behaviour for Cisco (whether intended or not). I haven't been able to test with other vendor routers.

I sincerely hope someone is able to provide a reasonable explanation for the behaviour.

Best Regards,

Patrick Andersen

MHM Cisco World · ‎07-21-2022

for both Op-A and Op-B check the
show mpls forwarding table
I think in Op-A which is not work there is NO LABEL for the LO of 2.2.2.2/32
that why the traffic drop.

PatrickAndersen · ‎07-21-2022

Hi MHM,

In Op-A, 2.2.2.2/32 is not part of the solution. The multihop EBGP session is established with CE01's directly connected IP 1.1.1.2/30.

Best Regards,

Patrick Andersen