Asymmetric Routing even with Path Control Configured

Dean Romanelli · ‎09-25-2015

Hi All,

I am having a problem from one of my branch sites to my data center. Please see attached topology drawing. Client connections to the sales server in the data center are intermittently disconnecting their sessions. Wireshark shows what is happening is the clients at the branch send data to the sales server in the data center, and the sales server in the data center receives the data and ACK's it, but the ACK's never get back to the client PC's at the branch. This results in the clients resending their data until their keep alive & retry expires, and then they disconnect the session. When they reconnect, everything is fine until the time this issue crops back up, which ranges from minutes to days.

Sniffing on the primary path firewall in the data center shows the following:

2015-09-25 07:20:29 Local4.Info Sep 25 2015 07:20:51: %ASA-6-106015: Deny TCP (no connection) from 172.16.1.200/512 to 192.168.90.127/52061flags PSH ACK on interface inside

Doing a little research on this log message shows that 99% of the time "Deny TCP (no connection)" is caused by asymmetric routing, when a site has more than 1 exit point and no path control in place. That is my scenario except I have IP SLA configured on the layer 3 switches and tracking the primary route on both ends, with a higher weighted backup static route that should only be in play if the SLA fails. So my path control is there, but asymmetric routing still appears to be happening. SLA configs are attached.

Any ideas are appreciated.

Jon Marshall · ‎09-25-2015

Dean

I can't access your text file but can see the diagram.

Are the firewalls at each site standalone ?

By the looks of the IP addresses on the outside interfaces at the DC it would appear so but then the client firewalls inside interfaces are on the same IP subnet.

What does a traceroute show or are you not allowing that through the firewalls.

It could very well be a problem with asymmetric routing as you say but if you are hardcoding the paths it's difficult to see why that would be the case.

There is a feature called TCP state bypass on the ASAs which could potentially fix the issue.

I say fix but you are in effect turning off stateful checking of the connection so I'm not sure whether i would want to use it unless there was no other way.

Edit - when you see the issue do you see any IP SLA events ?

Jon

Dean Romanelli · ‎09-25-2015

Hi Jon,

Thanks for replying. Here's the text file:

Branch Side SLA on Core SW:

track 100 ip sla 1 reachability
delay down 5 up 30

ip sla 1
icmp-echo 50.xxx.xx.190
frequency 10
ip sla schedule 1 life forever start-time now

ip route 0.0.0.0 0.0.0.0 192.168.90.5 track 100
ip route 50.xxx.xx.190 255.255.255.255 192.168.90.5

ip route 0.0.0.0 0.0.0.0 192.168.90.1 251

===============================================================

Data Center Side SLA on Core SW:

ip sla 2
icmp-echo 75.xxx.xxx.82
frequency 10
ip sla schedule 2 life forever start-time now

track 101 ip sla 2 reachability
delay down 5 up 30

ip route 192.168.90.0 255.255.255.0 198.xxx.xx.55 track 101
ip route 192.168.92.0 255.255.255.0 198.xxx.xxx.55 track 101
ip route 75.xxx.xxx.82 255.255.255.255 198.xxx.xxx.55

ip route 192.168.90.0 255.255.255.0 198.xxx.xxx.2 251
ip route 192.168.92.0 255.255.255.0 198.xxx.xxx.2 251

Yes, the firewalls are physically separate. The inside interfaces both reside in the same respective LAN subnets on both sides. Each WAN line goes to a different ISP. Unfortunately the trace routes from the core switch go over VPN tunnels, so they aren't terribly helpful, as there is only 1 hop, and it is the destination. The SLA's show no bouncing, unfortunately.

I agree with you about the bypass, but this has been going on for about 2 months now, so pretty much I'm at the point where either I need to tear down the redundant path entirely, or try out the bypass idea. I did have one other thought though:

Let's say I blow away all the SLA configs, the tracked route, & the backup route, and did the following:

1. Configure the primary static route only, & shut down the port on 3750 switch at branch facing the backup ASA.

2. Configure IP SLA, but don't track anything.

3. Configure EEM to look for: event syslog pattern "101 ip sla 2 reachability Up->Down"

4. When EEM sees that, It will execute a script that negates primary static route, configures backup static route, shuts down switchport to primary ASA & no shuts switchport to backup ASA.

I could do that on both ends, with the exception of the port shutdown part on the DC side.

That would effectively turn it into 1 exit point wouldn't it?

Jon Marshall · ‎09-25-2015

Dean

It depends what you mean by work.

Because the firewalls are standalone at each site then even if the path switches you would still see those messages because packets that are part of an existing connection are now going through a firewall that has no record of the existing connection.

So really switching across is not the issue although it would be good for both sides to agree which path they are meant to be using.

So you could do what you are proposing but even then if the client and server have not completely timed out their connections the "new" firewall will be reporting seeing packets that are not part of any connection in it's state table.

If the problem is though that both ends are not using the same path then yes could do what you are proposing although I have to say if you are not seeing any IP SLA events now it's not clear what's happening.

The only other solution I could suggest, although I haven't used it, is that I believe you can send OSPF routes over an IPSEC tunnel. The advantage of using a dynamic routing protocol is a failure anywhere means both ends (the L3 switches) stop receiving routes.

You could then have a simple floating static on each L3 switch pointing to the secondary ASA which would take effect only if there were no OSPF routes.

But again this wouldn't solve the issue of there being no state shared between the firewalls for the existing connections.

Jon

Dean Romanelli · ‎09-25-2015

Hi Jon,

Yeah it is a strange issue. I put in a maintenance last night and reloaded the data center primary ASA, and I've only had the issue happen once today, but this time the Deny TCP log message came in on the branch primary ASA, and not the DC one, where it seems to have always been the DC one in the past.

We've had this configuration in place and running fine for over 3 years until the past 2 months, so I am wondering if I was on the right track with the DC ASA reload, but maybe the full path needs to be reloaded and not just DC side (i.e. reload branch ASA's as well).

According to the SLA history, I don't show any flips, so if packets are going to the backup ASA, I'm not sure how. Unless their is another explanation for the log message.

As for the OSPF, unfortunately the ASA models we have in the field don't support dynamic routing.

Just for kicks, how would I go about configuring the bypass for this situation?

Jon Marshall · ‎09-25-2015

You don't say which version of ASA software you are using but if you do an internet search then you should find examples for your version or use the configuration guides.

I am still not entirely sure what is happening if you are not seeing any IP SLA events it seems to be unlikely it using the back up path.

I'll have a dig around but you are right the message is usually indicative of asymmetric routing.

Jon

Dean Romanelli · ‎10-01-2015

Hi Jon,

I pulled the IP SLA off of the site and shut down the path to the backup and we've been solid for 4 days now. Makes no sense to me since the SLA should have controlled the routing, unless maybe the switch is on it's way out.

Thanks for your help with this.