Solved: Re: Anyconnect client on FTDv not passing traffic

Travis Hysuick · ‎04-21-2020

Hi all,

I'm working on a PoC utilizing an FTD virtual appliance for Anyconnect VPN connectivity; the customer is wanting to migrate from legacy ASA to FPWR and I thought this should be a relatively easy migration, though it's proven to be more challenging than I expected.

The actual topology is a pretty vanilla 3-legged appliance deployment with inside, outside, and DMZ zones/interfaces. The inside consists of a routed /30 transit network between the FTDv and an SVI terminating on a 3560-X L3 switch (the 3560-X switch has a handful of network segments on the inside that I'm utilizing basic static routing for reachablility).

The VPN configuration is a full-tunnel deployment (SSL only no DTLS), with RADIUS + X.509 for the AAA backended with an ISE 2.6 virtual appliance and a Windows Server 2016 VM running the AD, DNS/DHCP, CA, and OCSP responder roles installed. The dual auth is working as intended, as the VPN client does connect and is assigned an IP address from the correct DHCP pool on the Windows server.

That's where the PoC success more or less ends; once connected, the client is unable to access any resources either inside or outside the perimeter of the secure network.

I thought perhaps my NAT policy was missing something, however I've gone over it a million times and it's bitwise identical to the known working configuration on the ASA that I've always used and have never had an issue with.

Inside subnets (top 3 rules) --> VPN Client subnet == NO NAT (Manual static NAT exemption)

VPN Client subnet --> VPN Client subnet == NO NAT (Manual static NAT exemption)

VPN Client subnet --> Internet == Hairpin interface PAT on the outside (After-auto)

I also reviewed the Access Control policy and am logging the default rule (set to block all communications) for any sign of the AC policy dropping the traffic, and I'm not using the sys-opt vpn bypass, so my Access Control policy has specific permit statements to allow the ingress / egress traffic both to the internal hosts, as well as to the Internet. I also have a Null0 route configured for the VPN client subnet and uRPF enabled as per best practives for VPN deployment, and I do see the explicit /32 host routes in the RIB on the FTDv when the client connects.

When I do a Capture w/Trace in full tree view, the traffic is passing all of the requisite checks and is being allowed by AC, NAT, inspection, SNORT, IP-Options, etc.

Thinking perhaps it was something possibly routing / L3 related, I attempted to simply ping the VPN client by it's private address FROM the FTDv appliance, however even this times out (although I can see the RX counter increment on the client with each successive ping from the FTDv).

This isn't platform specific for the clients, as the behavior is identical on a Mac, Windows, and iOS client platform.

For reference, the software in use is FMCv / FTDv 6.4.0.8 on an ESXi 6.5 U3 host, with AnyConnect 4.8.02045 on the Mac/Win platforms, and whatever the latest AnyConnect iOS client version is today.

I would be very interested to know if any of the community folks have encountered similar behavior and what would need to be done to correct it? I haven't tried building this on a physical FTD as I don't have one handy for sandbox / dev work at the moment, and I don't see this being a hypervisor or switching issue as the traffic does seem to be passing through the FTDv.

Marvin Rhoads · ‎04-21-2020

OK. So when you do the capture with trace, what does the packet capture show you?

Do you see the traffic from the VPN client towards the destination host?

Do you see any reply traffic? If not can you also capture from the destination host's perspective?

I just checked a capture on my lab's FTD-based remote access VPN and see my VPN client traffic leaving the inside interface and return traffic just fine. So capturing on the inside interface filtering on the destination host you are testing is a good approach to troubleshoot it.

View solution in original post

Marvin Rhoads · ‎04-21-2020

Your NAT looks correct.

Is AnyConnect on your connected client showing that it is getting 0.0.0.0/0 for the VPN route?

Do the client's internal networks know to use the FTD inside interface for reachability of the VPN subnet?

Does "show vpn-sessiondb detail anyconnect" from the FTD cli indicate you are assigned to the desired tunnel-group and group-policy?

Travis Hysuick · ‎04-21-2020

Correct on all 3 counts Marvin,

- The AnyConnect client does get the 0.0.0.0/0 secured route (tunnel all networks, send all DNS queries over the tunnel)

- The FTDv is the single Internet egress for all the inside subnets (everything on the subnets behind the switch are just stubs)

- The session-db contains the expected tunnel-group and group-policy names and cooresponding crypto attributes.

As I said, this is such a strange issue, and while I admit I'm not as strong on FPWR as I am on ASA, I'm not seeing anything that ~shouldn't~ work as intended here.

Marvin Rhoads · ‎04-21-2020

OK. So when you do the capture with trace, what does the packet capture show you?

Do you see the traffic from the VPN client towards the destination host?

Do you see any reply traffic? If not can you also capture from the destination host's perspective?

I just checked a capture on my lab's FTD-based remote access VPN and see my VPN client traffic leaving the inside interface and return traffic just fine. So capturing on the inside interface filtering on the destination host you are testing is a good approach to troubleshoot it.

Travis Hysuick · ‎04-22-2020

Good morning Marvin,

A wireshark capture was definitely helpful here and thank you for the suggestion, as I was NOT seeing the incoming DNS queries from the Anyconnect client arrive at the DNS server, although the FTDv WAS forwarding the traffic to the switch's SVI as the next-hop.

As it turns out, I'm hitting something of a bug here between the switch and the FTDv appliance:

My ISE policy set for the VPN connection was assigning an SGT to the client traffic in the AUTHZ policy (which I can see that the FTD appliance was adding to the frames as it was passing them to the L3 SVI next-hop on the inside - it shows up as "Cisco Metadata" in Wireshark). The switch itself appears to have been silently discarding these 'tagged' frames - the moment I took the SGT assigment out of the ISE AUTHZ policy...boom... passing VPN client traffic without an issue.

So the question now becomes, why would the switch choose to discard these SGT-tagged frames, or is this a default behavior? I have only rudimentary familiarity with the SGX/SGT stuff myself, I only added these elements to the ISE policies for learning at a later date (ie: I have no TrustSec matrix configured etc).

Oddly enough, I have other access policies (for WLAN) on the same ISE instance that I have SGTs associated with, however I have no problems with the WLAN, and an additional Wireshark cap on the switch shows that the WLC is not tagging any frames that its forwarding to the switch, again this may be very expected behavior but I'm left struggling to figure out what the real 'smoking gun' here is.

Travis Hysuick · ‎04-22-2020

Well I think I asnwered my own question here...

Based on CL presentation I ran across at <https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2016/pdf/BRKSEC-2203.pdf> this is absolutely expected behavior as the 3560-X is not configured for CTS and is treating the 'tagged' frames from the FTDv as malformed and discarding them.

So as it turns out, it can be a risky thing to have components of the TrustSec stack enabled if your network is not end-to-end configured to support it.

Case closed...Marvin, thank you again for the poke to run back to Wireshark for answers.

Marvin Rhoads · ‎04-22-2020

Correct. I was pretty sure that was the case but hadn't taken the time to track down a citation. The one you found explains it perfectly. Thanks for sharing that.

One poorly understood concept about Trustsec (I had it myself when I first tried to implement it) is that you need to define the Trustsec domain carefully and verify that every network device in the domain has the prerequisite capabilities and configuration to accommodate the desired end-to-end behavior. That's not a trivial exercise and (in my opinion) a significant barrier to adoption.