eBGP Re-Establishing Constantly with Palo Alto Peer

edwardschubauer · ‎02-13-2025

I have two Firepower 1120 FWs in an HA. I have two IPSec tunnels running successfully on those Firepowers to a Palo Alto firewall and I also have BGP established over them. The Firepowers are in Mexico and the PA firewalls are in USA. Routes exchange fine when BGP isn't re-establishing. Mexico uses a different public IP address for each tunnel. Mexico's secondary is 400 Mbps and it's primary is 100 Mbps.

When watching the live runtime stats on Palo Alto you can see the timers resetting back to 0 every 5-10 minutes. It's especially noticeable when I RDP into a Mexican VM over the tunnel and lose connection to it. My co-worker is trying to run VEEAM backups over this tunnel and they can never finish. The Cisco Firepower public IPs connect to a single public NAT address that I use from the USA side which translates to an internal address.

Logs on Palo Alto say BGP peer session left established state. Then the IPSec tunnel rebuilds. If we do not RDP into any machines in Mexico or run VEEAM backups the sessions stays established for way longer. Right now they are going on 3 hours.

How do I debug more or check system logs for BGP events on these Cisco Firepowers? I usually work with Palo Alto and FortiGate firewalls. It has a high impact on my my business operations.

Torbjørn · ‎02-14-2025

This is difficult to answer without more information. See the following article for how to check relevant logs on the firepowers: https://www.cisco.com/c/en/us/td/docs/security/firepower/630/configuration/guide/fpmc-config-guide-v63/firepower_threat_defense_vpn_troubleshooting.html

Happy to help! Please mark as helpful/solution if applicable.
Get in touch: https://torbjorn.dev

MHM Cisco World · ‎02-15-2025

Two IPsec from two FW HA?

That wrong.

You need to run only one IPSec in Palo and it must point to Active FW.

MHM

paul driver · ‎02-15-2025

Hello
So your saying the bgp peering running over the ipsec tunnels is flapping ?
iwhat are you using for the peer addressing ?

do you get any notification of the flapping
as a test (if applicable on the fws) from cli
try telnet to the bgp peer in port 179 sourcing from the adjacent peer ip- is this successful or is it intermittent
check you route tables for intermittent router withdrawals
also make sure you are not running into recursion in that that the peering source is t being re advertised back over bgp when it establishes

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

edwardschubauer · ‎02-17-2025

For the two tunnels we have the addressing is like this:
Local = 10.183.255.8/31 and Remote = 10.183.255.9/31
Local = 10.183.255.14/31 and Remote = 10.183.255.15/31

We confirmed that when BGP re-establishes the tunnels do not go down. We determined that with tunnel monitoring on the Palo Alto side and then using the CLI for the active Cisco firewall:

We are beginning to wonder if there is some sort of fragmentation or MTU issue when our Systems guys runs the VEEAM backup.
When I came into the office today the timers for the BGP sessions on Palo Alto's side were going strong for like 14 hours. Once we began sending data over the tunnels, the primary BGP established session would reset again.

Giuseppe Larosa · ‎02-16-2025

Hello @edwardschubauer ,

Cisco Firepower are not easy to troubleshoot, because most of troubleshooting is done on the CLI and there are several CLI.

to be noted the CLI shell that you can open from GUI does not allow all commands to be executed.

Use putty or secureCRT and have a stable session with the firepower.

in a FP1120 only the active FW speaks.

To be noted you need to provide the peer-id = private IP address of outside interface to most of third party fWs including Meraki MX.

log files can be attached as file texts,

From the putty/securecrt you get the prompt:

>

this is the clish shell from here you can use :

show vpn-sessiondb l2l

show crypto ipsec ikev2 sa peer <peer-ipaddress=remote public IP address>

Depending on how you deployed the FW HA you can have only a subset of features. With FDM FP lacks all QoS features so it cannot give priority to BGP session packets and it cannot shape your Veeam backup traffic over the tunnel.

if you are using FDM on FP you need to make shaping of traffic on paloalto side.

With FMC that is a management server that can handle multiple sensors = FTD devices either alone or in HA pairs you have more features .

Hope to help

Giuseppe

rais · ‎02-17-2025

Local = 10.183.255.14/31 and Remote = 10.183.255.10/31. Typo?

Installing static routes would keep the traffic going?

edwardschubauer · ‎02-18-2025

Yes typo I fixed it. We installed static routes as a test to bypass using BGP over the tunnel and the VEEAM traffic still failed. Here are logs the Systems guy provided from his VEAAM backup:

2/17/2025 11:12:49 PM :: Backup copy task started at 2/17/2025 11:12:49 PM
2/17/2025 11:13:35 PM :: Failed to execute backup copy task
2/17/2025 11:13:48 PM :: Backup copy task started at 2/17/2025 11:13:48 PM
2/17/2025 11:16:15 PM :: Failed to process disk 9e96a229-09dd-4eef-ab29-16793e91cdb0
2/17/2025 11:16:16 PM :: Failed to execute backup copy task
2/17/2025 11:16:50 PM :: Backup copy task started at 2/17/2025 11:16:50 PM
2/17/2025 11:18:12 PM :: Failed to process disk 9e96a229-09dd-4eef-ab29-16793e91cdb0
2/17/2025 11:18:13 PM :: Failed to execute backup copy task
2/17/2025 11:18:52 PM :: Backup copy task started at 2/17/2025 11:18:52 PM
2/17/2025 11:20:21 PM :: Processing disk 9e96a229-09dd-4eef-ab29-16793e91cdb0
2/17/2025 11:20:44 PM :: Processing disk 19388249-9c95-4daf-85d5-33c55b285a0a
2/17/2025 11:21:20 PM :: Processing disk 0e063c78-a9c2-442d-9def-11a484c34305
2/17/2025 11:21:40 PM :: Processing disk 090df1ac-014a-44b6-af1e-e757b8975a0b
2/17/2025 11:22:51 PM :: Processing disk 91bf08b2-49e6-475a-aa52-143c9ea3e299
2/18/2025 12:21:58 AM :: Failed to process disk bda0b19d-a4c7-4fa8-bc98-ebc4907cef79
2/18/2025 12:21:58 AM :: Failed to execute backup copy task
2/18/2025 12:23:54 AM :: Backup copy task started at 2/18/2025 12:23:54 AM
2/18/2025 12:25:30 AM :: Failed to process disk 9e96a229-09dd-4eef-ab29-16793e91cdb0
2/18/2025 12:25:30 AM :: Failed to execute backup copy task
2/18/2025 12:25:58 AM :: Backup copy task started at 2/18/2025 12:25:58 AM
2/18/2025 12:26:31 AM :: Failed to execute backup copy task
2/18/2025 12:26:31 AM :: Processing finished with errors at 2/18/2025 12:26:31 AM

UPDATE: The VEEAM backup completed at night time successfully for the first time

vishalbhandari · ‎02-18-2025

@edwardschubauer Your IPSec tunnels are flapping due to high traffic loads, which is affecting BGP stability. Since the tunnels reset when running RDP and VEEAM backups, it could be due to traffic spikes causing issues with rekeying, NAT traversal, or stability under load.

To debug BGP events on Cisco Firepower, you can check logs with:

BGP Logs:
show bgp ipv4 unicast summary show bgp vpnv4 unicast all summary
These will show BGP peer status and any flaps.
System Logs for BGP:
show logging | include BGP
This will help identify why the session is dropping.
IPSec Debugging:
show crypto ikev2 sa show crypto ipsec sa show logging | include IKE
This will show tunnel rekeying events or failures.
CPU and Memory Load:
show processes cpu sorted show processes memory
High CPU/memory may indicate that traffic spikes are overwhelming the firewall.

Since the issue happens under load, check if the Firepower is hitting a bandwidth limit or dropping packets due to NAT exhaustion. Also, ensure Dead Peer Detection (DPD) and keepalives are correctly configured on both sides to maintain tunnel stability.

edwardschubauer · ‎02-18-2025

Found some interesting stuff during pcap. Packet length going over 1500 on VEEAM backups. This is sorted but there were ones in the 2000s and 4000s as well.

Giuseppe Larosa · ‎02-19-2025

Hello @edwardschubauer ,

the MTU used on LAN side on the VEEAM hosts matters a lot nice finding

Best Regards

Giuseppe

edwardschubauer · ‎02-19-2025

It's nice to find but now I have to figure out what to do with this information.

MHM Cisco World · ‎02-19-2025

Bgp established only with active' active will sync prefix to standby.

It HA no need both established bgp with remote peer.

MHM

MHM Cisco World · ‎02-19-2025

BGP is supported in Active/Standby and Active/Active HA configurations.
Only the Active unit listens on TCP port 179 for BGP connections from peers.
The Standby unit does not participate in BGP peering, and hence does not listen on TCP port 179 and does not maintain the BGP tables.

https://www.cisco.com/c/en/us/support/docs/security/asa-5500-x-series-next-generation-firewalls/118050-config-bgp-00.html#toc-hId--148732452

MHM