Anyconnect backup server failover time

CYKmb · ‎05-30-2024

I have a VPN profile setup for Anyconnect with backup server configured. (AnyConnect Secure Mobility Client 4.10)

If the primary server is not available, the client fails over to the backup server and connects as expected.

The client takes over 60 seconds before it will even try to connect to the backup server.

Is there any way to set the connection timeout (not the authentication timeout, we haven't got that far yet!) to a smaller value. Really, if the client can't get an IP response back from the host within about 5 seconds, it should consider that host dead and move on to the backup.

EDIT: LOAD BALANCING IS NOT THE RIGHT SOLUTION FOR MY PROBLEM.

ccieexpert · ‎05-30-2024

There is no setting from what I can see in the XML.. you may want to talk to your Cisco or partner account team to request the feature. depending on your platform, you may be able to set up vpn load balancing... you can also look at a few other options including a DNS load balancer that could monitor the VPN headend, and mark it down. When a user connects to "vpn.mydomain.com" it will send the ip of vpn headend that is active..

balaji.bandi · ‎05-30-2024

There are different ways to do it, DNS LB to decrease the timers. or in profile VPN XML file.

either case decreasing lower number have different side effects. 15-20 seconds is reasonable as per my experience.

You mentioned failover already working, you need to fine tune it, if this large environment better test in test environment before you deploy in live setup for better outcome.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

CYKmb · ‎06-03-2024

DNS LB is not really the right solution in this case. Honestly, if the client doesn't receive IP response from the server within 2-3 seconds, that link is unreachable and the next should be tried.

I couldn't find a connection timeout value in the profile XML, just the authentication timeout. I need to keep the authentication timeout long enough for 2FA responses (humans are slow!) but the users don't need to be sitting for a long time waiting for the connection to timeout--it also seems like the authentication timer doesn't start ticking until the IP session is established anyway.

Do you know what the XML tag/key is for server connection timeout?

balaji.bandi · ‎06-03-2024

If the DNS have same A record for both ASA FQDN, if one of them down and not resolved it will automatically go to next one round robin. in my testing it take 20 seonds before it failover.

If that not meet your requirement, you can have LB solution for resolution for the clients to handle correctly.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

CYKmb · ‎06-05-2024

Perhaps I am not understanding your recommendation because that just makes no sense at all. DNS is not typically aware of the state of the host, so it will continue to resolve, even if the host is down. It may also be the case that the host is not actually down, just unreachable from all routes, so DNS is still resolvable, as it should be so the hosts that can, will be able to reach it.

ccieexpert · ‎06-07-2024

Please read my reply.. yes very much possible with an intelligent DNS load balancer that will check the state of the servers/hosts.

ccieexpert · ‎06-07-2024

Hello.. let me explain further as i think i was not clear..

there are many DNS providers that will do a healthcheck like a http/https and ping to ASAs and if one is down, when DNS resolution is done, it will give the ip of the 2nd ASA unit.. please read this https://community.f5.com/kb/technicalarticles/using-f5-distributed-cloud-dns-load-balancer-health-checks-and-dns-observability/325073 .. hope that helps.. this is quite useful.. and there is also F5 regular load balancer.. so your https goes to F5 or another load balancer, and then it can send it to an active ASA...

tvotna · ‎06-10-2024

@CYKmb, not only the timeout to switch to the backup server is unconfigurable, the connection is also not re-attempted to the backup server if the primary server fails in the middle of the SSL/IPsec session. In other words, your established AnyConnect tunnels will get stuck in the "reconnecting" state forever (almost), if the primary server fails suddenly.

CSCte15276 Ability to use backup server list if initial session is lost
CSCte15271 Ability to support sharing session information for Anyconnect backup

You'd better use VPN load-balancing feature, when VPN cluster master checks availability of cluster members and doesn't redirect incoming tunnels to them should the member go down. This provides both load-balancing and redundancy. This doesn't solve the issue when a member goes down in the middle of the connection though. AnyConnect still gets stuck in reconnecting state in this case.

The backup server switchover time is long because the client tries to switch over after 3 TCP SYN packets, not sure if it is really 60 seconds though.

ccieexpert · ‎06-10-2024

good points.. but vpn load balancing built in feature is only available for local load balancing for a single site.. not for geographical load balancing or redundancy. in those cases using a external load balancer or dns load balancer is better

CYKmb · ‎06-11-2024

I won't get into the details, but as I have repeatedly said: LB is not the right solution for this problem, and it's not really a VPN cluster. The reconnect is less of a concern, it's really just the initial connection. Unfortunately, my budget and time restrictions limit my ability to implement the right solution, so I am forced to work with the tools I have available.

The backup server switchover time is long because the client tries to switch over after 3 TCP SYN packets, not sure if it is really 60 seconds though.

The timeout exactly 60 seconds. I can see it from the log traces between the initial connection attempt, and the switch to backup. It really just sits there for 60 seconds and does nothing. I can't fathom a network client that has been in development for at least two decades and still doesn't have a connection time-out setting, but here we are...

tvotna · ‎06-11-2024

For clarity sake, I'm talking about load-balancing built into the ASA/FTD software which doesn't require any additional products or services or licenses. It requires two or more standalone ASA/FTD units (or failover pairs, although failover rarely used in this scenario) with outside interface IP addresses assigned from the same subnet (inside interfaces can reside in any VLAN/subnet). One unit is configured as a master unit (by assigning priority). It redirects connection requests to other units or to itself taking the number of VPN sessions on each unit into consideration.

CYKmb · ‎06-11-2024

That doesn't answer the question I asked at all, and as per the parameters of the original question this needs to work when the primary server is not reachable. If it's not reachable, it can't very well redirect responses, can it?

tvotna · ‎06-11-2024

As I said earlier the connection timeout is unconfigurable when backup servers feature is used. In case of VPN load-balancing built into the ASA/FTD software the VPN cluster will still be operational if its master unit fails. The new master unit will be elected in 3 or 6 seconds, because units exchange hello packets between each other.

CYKmb · ‎06-11-2024

@ccieexpertsorry, the threading on this forum is not very clear. That was a response to @balaji.bandi.

I understand how load balancers work. I have used them, and I have, in-fact built my own. My situation here is a little unique, and let me just say a little complicated. I also am working within certain constraints. As I have mentioned though, DNS LB is not really the right solution to this problem.

Thanks for the F5 tip, though. I'm always a bit suspicious of services that won't publish their pricing. I usually find that if you have to ask, you can't afford it

Really, having the client automatically cycle to the backup as needed just makes things simpler for the end user. I have a ticket open with Cisco to try and reduce the timeout, but there doesn't seem to be a very straight forward solution. It's also possible that I have some weird stuff going on. I tend to push the boundaries a bit, and I'm dealing with some legacy configurations, and working around some security mitigations...and it's all a big rush...