cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
708
Views
10
Helpful
7
Replies

Drop between two Sites - BGP Issues?

Pascal85045
Level 1
Level 1
Hi all,
 
I'm currently experiencing Network Problems between two Datacenters which i can't make sense of. I banged my head for days on this.
 
About the Setup:
Datacenter in Germany has a Cisco ASA 5525-X (9.8.4.32), US Datacenter has a Cisco ASA 5516-X (9.8.4.20. The ASAs are directly connected to the ISP, so no Router in front.
 
We are using IKEv2 VPN with BGP and VTI between those Sites.
 
The problem I'm experiencing is: Sometimes we have a data outage between both Datacenters, external and internal. I have pingplotter running on both sides.
 
Site A pings Site B (external IP Address + multiple Servers internally over VPN)
Site B pings Site A (external IP Address + multiple Servers internally over VPN)
 
When the Problem happens we have heavy Packet Loss internal and external both directions. The Routing between the two DCs is asynchronous. The Path is different from A to B, vs B to A. For example Germany to US uses Cogentco while US to Germany uses Level3.

Unfortunately we can't change that at the moment.
 
 
When checking the Path there is no route change or intermediate Hop dropping out. Packet loss is shown directly on the Firewall by Pingplotter.
 
That means: Site A shows ASA of Site B with heavy Packet loss. Site B shows ASA of Site A with heavy Packet loss.
 
All other Targets are fine, so Site A had multiple VPNs to other DCs, no drop. Ping to HA Targets like 8.8.8.8 shows not one Packet lost. The Same on Site B.
 
I'm monitoring Site A and Site B from other external Targets like AWS. The external Monitoring shows no Packet Drop on either A or B external and Internal Interface during the Outage between A and B. So i would exclude a general ISP issue. We have multiple Racks in the German Datacenter which are using the same ISP and Network Cycle but their own ASA of course. They do not show a drop on the Interfaces of A and B.
The VPN-Tunnel between both Site A and Site B stayed up during that time, it did not drop.
 
So there seems to be a problem just between A and B which i can't figure out.
 
Any suggestions on how to troubleshoot further? I thought that maybe the BGP VPN Tunnel with VTI causes a routing issue. But i can't find anything in the logs.

Thanks in advance for any help or suggestion.
 
7 Replies 7

Hello,

 

tough one. Sounds like you already have done a lot of troubleshooting. How much traffic is flowing over the link ? Make sure that you are not running into a traffic volume based rekeying problem. Your best option is to set the 'Traffic Volume' to unlimited. The link below has a screenshot of how to enable this in ASDM (point 4.).

 

https://cohesivenet.zendesk.com/hc/en-us/articles/115000044872-Disabling-Cisco-s-data-lifetimes-through-ASDM

Thanks for the Input. I will give this a try as well.

 

We are in contact with the DC as well, it could be DDOS Protection or something similar kicking in. At least that's one option we are looking into.

Sorry I forgot: we have around 20Mbit/s between both Datacenters.

Joseph W. Doherty
Hall of Fame
Hall of Fame

For your PingPlotter results, pinging external to the tunnel, you see no jump in latencies and/or drops along the path?

If not, possibly that's because of your asymmetrical route paths.

In the past I've found using IP source routing handy for probing paths "invisible" to "normal" routing, but now a days, most disallow that option.

If you could cause your routing to stop using the two paths, for alternate directions, you might be able to "see" a problem node when this happens.

Hi Joseph,

 

That's right. We don't see any jump in Latency along the Path. We are currently in touch with the DC to make sure no DDOS Protection is kicking in or so.

 

We are in the process of getting private circuits. Should be done in 4 weeks. But I'm still trying to change the routing before that for testing.

Hello
You don’t say what outage it is your experiencing - do you drop bgp /tunnel peering - do you see igp failover, convergence etc...
Can you elaborate a bit more on the actual outage 


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Hi Paul,

The Outage means all Data Traffic between those two Datacenters stopped. 

We are using an Active / Standby Cluster but no Failover happened during the Outage time on both sides.

The VPN Tunnel between both Datacenters stayed up, probably due to the preserve VPN Option. 

When checking the BGP and the Logs, there was no route change or reconvergence. Everything stayed the same (from what i can see on the logs).  

Logging is set to Export everything with Warning / Error Level. BGP Messages could be at Informational or Debugging Level? So i may have to fine tune that. I will look into that now and adjust it. 

I've attached a picture of how it looks in Pingplotter. The picture is the same Site A --> Site B | Site B --> Site A


I've upgraded the US-Datacenter now to 9.8.4(32) as well.

Thanks

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: