How To: TACACS Failover with F5 BIG-IP Virtual Servers

Damien Miller · ‎02-06-2019

This is directed at those already leveraging F5's with TACACS or those that do in the future. It is not meant to be an all encompassing guide, rather an addition for an issue you need to be aware of. There is a well known guide jointly developed between Cisco and F5 that covers configuring Big-IP for use with ISE. It was developed during the era where ISE did not yet support TACACS, as such it completely omits the requirements and planning for TACACS. Lucky for us, most of the configuration is identical, you are just tacking on TCP 49.

https://community.cisco.com/t5/security-documents/how-to-cisco-amp-f5-deployment-guide-ise-load-balancing-using/ta-p/3631159

I recently had the opportunity to develop and test a TACACS deployment leveraging F5 load balancers, and I want to bring your attention to an issue that has a solution, just not an obvious one. For the purpose of this discussion we will utilize two ISE PSN's in a F5 virtual server group. Health checks are performed by the F5 to each PSN so that the load balancer is aware when the nodes are in a failed state. There is nothing unusual about this, and it is a commonly performed task from a load balancing design perspective.

One of the tests we performed included taking down both PSNs in one of the F5 virtual server groups. This is when the issue presented itself, the NAD, a 3850 in this case, would never fail over to the other servers configured in its TACACS server group. The 3850 will continue attempting to authenticate with what is a dead VIP. A packet capture followed and I have included the exchange below. The 3850 is represented by 10.10.10.10 and F5 VIP 10.20.20.20.

10.10.10.10 -> 10.20.20.20 TCP 5157 > tacacs [SYN]
10.20.20.20 -> 10.10.10.10 TCP TACACS > 5157 [SYN, ACK]
10.10.10.10 -> 10.20.20.20 TCP 5157 > tacacs [ACK]
10.10.10.10 -> 10.20.20.20 TACACS+ Q: Authentication
10.20.20.20 -> 10.10.10.10 TCP TACACS > 5157 [SYN, ACK]
10.20.20.20 -> 10.10.10.10 TCP TACACS > 5157 [RST, ACK]

Flags: 0x14 (RST, ACK)
Reset cause: BIG-IP: [0x23d6b9c:2328] No pool member available

The F5 VIP is going through with the TCP handshake even though it knows that the virtual server members (PSNs) are down. This causes the 3850 to think that the TACACS server (VIP) is still good. The process will continuously repeat on the switch, establishing a connection, then being reset by the F5. As administrators, we can see that the exchange resulting in a connection reset (RST) is not ideal, but the switch does not interpret this as bad.

In order to make the F5 ignore the NAD's while the PSN's were in a failed state, it required adding the following configuration, your naming will vary slightly.

list ltm virtual ise_tcp49_vs | grep -E "ltm virtual|service-down"
ltm virtual ise_tcp49_vs {
service-down-immediate-action drop

Once the drop action is introduced, the behavior of the 3850 is what we would want. In the following packet capture, the 3850 is represented by 10.10.10.10, the F5 VIP that should be dead is 10.20.20.20, and the alternate F5 VIP that is active is 10.30.30.30.

10.10.10.10 -> 10.20.20.20 TCP 8317 > tacacs [SYN]
10.10.10.10 -> 10.20.20.20 TCP 8317 > tacacs [SYN]
10.10.10.10 -> 10.30.30.30 TCP 8318 > tacacs [SYN]
10.30.30.30 -> 10.10.10.10 TCP tacacs > 8318 [SYN, ACK]
10.10.10.10 -> 10.30.30.30 TCP 8318 > tacacs [ACK]
10.10.10.10 -> 10.30.30.30 TACACS+ Q: Authentication
10.30.30.30 -> 10.10.10.10 TCP tacacs > 8318 [SYN, ACK]
10.30.30.30 -> 10.10.10.10 TACACS+ R: Authentication

After failing to handshake with the first configured TACACS server (VIP), the NAD continues on to authenticate against the alternate, 10.30.30.30 in this case.

If you already have ISE load balanced on F5's or are setting up new F5's, test for this failure scenario, and apply the fix above if the same behavior is observed. It's possible that your F5's may already be set up this way, but it was not the default behavior to drop connections.

This TCP RST/ACK behavior is documented in a couple F5 knowledge base articles for anyone interested in reading.
https://support.f5.com/csp/article/K9812
https://support.f5.com/csp/article/K8082

How To: TACACS Failover with F5 BIG-IP Virtual Servers

AnyConnect Certificate Based Authentication.

Getting past intermittent/unexplained 802.1x problems on Windows 7

Insights About Multiple Vulnerabilities in Cisco Discovery Protocol Implementations (CDPwn)