Re: IOS-XR NCS-57B1: IPv6 Reachability problems

swahlmark · ‎09-30-2025

TL;DR: After migrating from ASR9Ks to NCS-57B1s (IOS-XR 7.10.2), four IPv6 PNIs to the same provider flap across four different devices. ND stays REACH, but IPv6 echo and BGP TCP/179 stall. Clearing the IPv6 neighbor cache brings it back for ~4–5 minutes, then it dies again. IPv4 PNIs and IPv6 to other peers are clean.

Environment

Platform: Cisco NCS-57B1 (IOS-XR) running 7.10.2
Interface: Bundle-Ether (single active member, LACP up)
IPv6 MTU: 1500 available (interface MTU 1514)

Symptoms (repeatable on multiple 57B1s / ports)

v6 BGP establishes only after clear ipv6 neighbors …, then drops ~4–5 minutes later.
TCP to 179: show tcp brief frequently shows CLOSED; at times SYNSENT (we send SYNs, no SYN-ACK back).
ND view: show ipv6 neighbors keeps the Google GUA at REACH the whole time.
BGP: last reset reason seen earlier was hold time expired

Configurations remain unchanged from original device post migration

BGP Configuration:

remote-as
timers 15 45
password *
description *
update-source Bundle-Ether11
address-family ipv6 unicast
route-policy rpl-google1-pni-in in
route-policy rpl-google1-pni-out out

Harold Ritter · ‎10-01-2025

Hi @swahlmark ,

It could be an issue with the maximum packet size the link can handle. Could you provide the log messages you received when the BGP session is reset.

You can determine what is the maximum packet size the link can handle using the following command.

ping <neighbor ip address> size 1500 df-bit

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

swahlmark · ‎10-02-2025

@Harold Ritter Thank you for the reply.

Bundle MTU: 1514 bytes

ping tests:

ping <IP> size 1500 df-bit
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to <IP> timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

ping <IP> size 1501 df-bit
Type escape sequence to abort.
Sending 5, 1501-byte ICMP Echos to <IP> timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Here is the log message - hold time expired. BGP drops when ipv6 reachability dies

RP/0/RP0/CPU0:Sep 29 15:47:56.857 EDT: bgp[1090]: %ROUTING-BGP-5-ADJCHANGE : neighbor <IP> Down - BGP Notification sent, hold time expired (VRF: default) (AS: #)

Harold Ritter · ‎10-02-2025

Hi @swahlmark ,

I see that you are using BGP MD5 authentication. I have seen several bugs related to this feature.

Can you provide the output from the following command when the BGP session is up.

show tcp det pcb <PCB for the BGP session> | i Datagrams

PS: You need to use the "show tcp brief" command to find out what is the PCB for the BGP session.

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

swahlmark · ‎10-02-2025

Hi,

Datagrams (in bytes): MSS 1392, peer MSS 1392, min MSS 1440, max MSS 1440

Harold Ritter · ‎10-02-2025

Hi @swahlmark ,

This output looks weird indeed. Do you know what is the peering router NOS?

To confirm this is an MSS issue as I think it is, you could configure the MSS manually on your side and see if it fixes the issue.

You can use the following command to do that:

router bgp <asn>

neighbor <peer address>

tcp mtu 1300

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

swahlmark · ‎10-03-2025

Hi @Harold Ritter ,

I believe the peering routers NOS is gNOs. It's a google owned router.
I configured tcp mss 1300, it wouldn't allow tcp mtu 1300.
I cleared ipv6 neighbors on the interface. Bgp remained established for about 5 minutes again until ipv6 reachability died.

Harold Ritter · ‎10-03-2025

Hi @swahlmark ,

> I believe the peering routers NOS is gNOs. It's a google owned router.

Thanks for the info.

> I configured tcp mss 1300, it wouldn't allow tcp mtu 1300.

My mistake. "tcp mss" is the command I was referring to.

> Bgp remained established for about 5 minutes again until ipv6 reachability died.

Are their any other log messages that you see before the session closes?

The next step would be to open with the Google Cloud people to further troubleshoot this issue.

Regards,
Harold Ritter, CCIE #4168 (EI, SP)

swahlmark · ‎10-03-2025

@Harold Ritter

These are the log messages. Only one other before the session closes.

RP/0/RP0/CPU0:Oct 3 08:39:29.633 EDT: bgp[1090]: %ROUTING-BGP-5-ADJCHANGE : neighbor <IP> Up (VRF: default) (AS: #)
RP/0/RP0/CPU0:Oct 3 08:42:22.589 EDT: fib_mgr[401]: %OS-MMAP_PEER-7-CONNECT : Connect from process 5146 to 51151 skipped: Connection refused
RP/0/RP0/CPU0:Oct 3 08:44:46.858 EDT: bgp[1090]: %ROUTING-BGP-5-ADJCHANGE : neighbor <IP> Down - BGP Notification sent, hold time expired (VRF: default) (AS: 15169)

I've had a case opened with google for a few weeks, they've been going in circles and are not sure what the issue is. I also had a TAC case opened, not much movement on that. The four sessions currently down, were up when originally on the two asr9k routers. Once they all got moved to the four separate 57b1s, they went down. I'm assuming its related to the platform since the configurations did not change. Thank you for looking into it.