09-30-2025 08:37 AM - edited 09-30-2025 08:39 AM
TL;DR: After migrating from ASR9Ks to NCS-57B1s (IOS-XR 7.10.2), four IPv6 PNIs to the same provider flap across four different devices. ND stays REACH, but IPv6 echo and BGP TCP/179 stall. Clearing the IPv6 neighbor cache brings it back for ~4–5 minutes, then it dies again. IPv4 PNIs and IPv6 to other peers are clean.
Platform: Cisco NCS-57B1 (IOS-XR) running 7.10.2
Interface: Bundle-Ether (single active member, LACP up)
IPv6 MTU: 1500 available (interface MTU 1514)
v6 BGP establishes only after clear ipv6 neighbors …, then drops ~4–5 minutes later.
TCP to 179: show tcp brief frequently shows CLOSED; at times SYNSENT (we send SYNs, no SYN-ACK back).
ND view: show ipv6 neighbors keeps the Google GUA at REACH the whole time.
BGP: last reset reason seen earlier was hold time expired
Configurations remain unchanged from original device post migration
BGP Configuration:
remote-as
timers 15 45
password *
description *
update-source Bundle-Ether11
address-family ipv6 unicast
route-policy rpl-google1-pni-in in
route-policy rpl-google1-pni-out out
10-01-2025 03:32 PM - edited 10-01-2025 03:51 PM
Hi @swahlmark ,
It could be an issue with the maximum packet size the link can handle. Could you provide the log messages you received when the BGP session is reset.
You can determine what is the maximum packet size the link can handle using the following command.
ping <neighbor ip address> size 1500 df-bit
10-02-2025 08:34 AM - edited 10-02-2025 08:34 AM
@Harold Ritter Thank you for the reply.
Bundle MTU: 1514 bytes
ping tests:
ping <IP> size 1500 df-bit
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to <IP> timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
ping <IP> size 1501 df-bit
Type escape sequence to abort.
Sending 5, 1501-byte ICMP Echos to <IP> timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Here is the log message - hold time expired. BGP drops when ipv6 reachability dies
RP/0/RP0/CPU0:Sep 29 15:47:56.857 EDT: bgp[1090]: %ROUTING-BGP-5-ADJCHANGE : neighbor <IP> Down - BGP Notification sent, hold time expired (VRF: default) (AS: #)
10-02-2025 09:51 AM - edited 10-02-2025 01:55 PM
Hi @swahlmark ,
I see that you are using BGP MD5 authentication. I have seen several bugs related to this feature.
Can you provide the output from the following command when the BGP session is up.
show tcp det pcb <PCB for the BGP session> | i Datagrams
PS: You need to use the "show tcp brief" command to find out what is the PCB for the BGP session.
10-02-2025 10:49 AM
Hi,
Datagrams (in bytes): MSS 1392, peer MSS 1392, min MSS 1440, max MSS 1440
10-02-2025 04:29 PM
Hi @swahlmark ,
This output looks weird indeed. Do you know what is the peering router NOS?
To confirm this is an MSS issue as I think it is, you could configure the MSS manually on your side and see if it fixes the issue.
You can use the following command to do that:
router bgp <asn>
neighbor <peer address>
tcp mtu 1300
10-03-2025 05:48 AM
Hi @Harold Ritter ,
I believe the peering routers NOS is gNOs. It's a google owned router.
I configured tcp mss 1300, it wouldn't allow tcp mtu 1300.
I cleared ipv6 neighbors on the interface. Bgp remained established for about 5 minutes again until ipv6 reachability died.
10-03-2025 06:45 AM
Hi @swahlmark ,
> I believe the peering routers NOS is gNOs. It's a google owned router.
Thanks for the info.
> I configured tcp mss 1300, it wouldn't allow tcp mtu 1300.
My mistake. "tcp mss" is the command I was referring to.
> Bgp remained established for about 5 minutes again until ipv6 reachability died.
Are their any other log messages that you see before the session closes?
The next step would be to open with the Google Cloud people to further troubleshoot this issue.
10-03-2025 07:21 AM
These are the log messages. Only one other before the session closes.
RP/0/RP0/CPU0:Oct 3 08:39:29.633 EDT: bgp[1090]: %ROUTING-BGP-5-ADJCHANGE : neighbor <IP> Up (VRF: default) (AS: #)
RP/0/RP0/CPU0:Oct 3 08:42:22.589 EDT: fib_mgr[401]: %OS-MMAP_PEER-7-CONNECT : Connect from process 5146 to 51151 skipped: Connection refused
RP/0/RP0/CPU0:Oct 3 08:44:46.858 EDT: bgp[1090]: %ROUTING-BGP-5-ADJCHANGE : neighbor <IP> Down - BGP Notification sent, hold time expired (VRF: default) (AS: 15169)
I've had a case opened with google for a few weeks, they've been going in circles and are not sure what the issue is. I also had a TAC case opened, not much movement on that. The four sessions currently down, were up when originally on the two asr9k routers. Once they all got moved to the four separate 57b1s, they went down. I'm assuming its related to the platform since the configurations did not change. Thank you for looking into it.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide