12-04-2018 01:20 AM - edited 03-11-2019 01:52 AM
Hi,
We are in process of migration ACS v5.8 to ISE v2.4 for a large international customer. My customer has a fully distributed deployment, operating on 3 regions/continents (EMEA, APAC, AM). ISE is used for 802.1x and VPN authentication, with AD integration.
In terms of ISE, we have:
In terms of AD, Microsoft multi-domain forest is in use, with two-way trust between all used domains. There are basically 4 domains:
AD servers are located in each region for each subdomain, and servers are at the same time DCs (Domain Controller), GCs (Global Catalog) and DNS servers:
We are using recommended Microsoft Sites and Services for all locations. All PSN nodes are in correct Sites, and there is no latency between e.g. AM PSN nodes and AM DCs (they are on same L3 device).
After successful configuration migration, we have joined PAN, MnT and PSN nodes to emea.mydomain.com (as it is main data center). As there is two-way trust, we can successfully pool AD groups from all subdomains, and we can authenticate users cross-domain. Thanks to this approach, we can use single JP (Join Point) as a reference in our policies. Everything works ideally, however...
From time to time, especially in peak hours for given region, we are receiving "High authentication latency" alarms. As the threshold for this alarm is 10s, I'm a bit worried about this one. We do have high-speed WAN links between regions, but it still might happen that there is a peak in utilization. Also, based on architecture, as we are using Sites and Services, I would expect minimum cross-domain communication from ISE standpoint (I'm aware that there must be some - e.g. EMEA user is roaming to AM, and authenticating to AM PSNs).
I did packet capture, and I can confirm that I can see that AM PSN is talking to AM DC, for captured RADIUS authentication. There is some communication from AM PSN back to EMEA DC, but this should be expected as it is joined to emea.mydomain.com. I can see high latency for multiple ISE services and scenarios, e.g.:
All of the alarms are raised for APAC and AM region, but never for EMEA, which makes me challenge design on AD integration part. Also, alarms are not raised for all authentications, nor entire time, so there is no obvious regularity.
I already went through tons of documentation and Live sessions, but there is actually no document describing how should a system be designed/deployed with multi-domain forest, in terms of which nodes to join to which domain/subdomain, how to build policies based on that approach, etc.
Could you please shed some light on this matter? Any experiences and recommendations with deployments like these?
Thanks