Cisco ISE cluster formation best practices - Endpoints status issue

ing.a.cepele1 · ‎01-05-2021

Hello,

We have 6 ISE nodes and we would like to assign a particular role to each of them. At the moment 2 of them are in the cluster and have all the roles assigned, 1 node as Primary on each role and the other as secondary. The 4 other ISE nodes are as standalone. Since it is a production environment, we would like to know which could be the best practices to form the 6 node cluster.

We did some testing in our lab. And we tested the scenario of joining each of the standalone into the cluster one by one. First we removed the role in a particular node and then assigned it to the new joining node.

Generally, this worked good, but we experienced some issues. The connectivity status of the endpoints was not correct. Devices later were listed as disconnected, even though they were connected. Endpoints that joined the network when PSN1 (Radius) was down, faced also connectivity status problems. That was solved by a port reset on the switch. But this late workaround is not accepted.

Arne Bier · ‎01-05-2021

Hello @ing.a.cepele1

With 6 ISE nodes there are a few Cisco approved configurations that I can think of - others are possible, but not officially supported:

Option 1: Assuming you have a load balancer to use all 4 PSN's evenly - OR - if the PSNs are scattered all over the place at various locations then no need for load balancer - it depends on your design - just think about how you plan to use those PSN's (perhaps dedicate primary/secondary to some sites, or dedicate some to RADIUS, Portals, TACACS, etc)

Primary PAN/Primary MnT
Secondary PAN/Secondary MnT
PSN 1
PSN 2
PSN 3
PSN 4

Or the other approach is to plan for massive scalability by splitting PAN/MNT - this gives option to scale PSN's up to 50!

Primary PAN - Admin only
Secondary PAN - Admin only
Primary MnT - Monitoring only
Secondary MnT - Monitoring Only
PSN 1 - Services
PSN 2 - Services

As for the experience you encountered, I suspect that perhaps the session database was not getting synchronised across PSN nodes. But this should not be an issue if you bring PSNs online in the desired design (as shown above) and then LEAVE IT ALONE!!! As soon as a NAD (switch/WLC) latches onto a RADIUS/TACACS server, then it will stay with that server until the server stops working (e.g. reboot the PSN, patch etc.) - if that PSN is stable then the NAD will be happy. In most cases a NAD only needs two AAA servers (Primary and Secondary) - or use a load balancer VIP and then make sure you know what you're doing (VIP single point of failure and managing session persistence)

Distributed ISE designs and deployments are tried and tested and if you plan the deployment right from day 1, and implement and LEAVE IT ALONE then it runs fine.

ing.a.cepele1 · ‎01-06-2021

Hi Arne,

Thanks for your insight.

No Load Balancer is in place. ISE nodes are located in different places.

Our goal design is as you mentioned:

Primary PAN - Admin only
Secondary PAN - Admin only
Primary MnT - Monitoring only
Secondary MnT - Monitoring Only
PSN 1 - Services
PSN 2 - Services

Currently, we have 2 nodes in the cluster. All roles (PAN, MNT, PSN) are assigned to each of these 2 nodes, either as primary or secondary. At the end both these nodes will serve as only PSN. AAA servers are running on these nodes. So we have a primary and secondary.

Since we have a live environment, how do you suggest forming the cluster of 6 nodes? We tested by adding one-by-one into the cluster.

It is impossible for our current environment to not affect a node status.

Basically, when we removed MNT and PAN roles from node 1, it restarted and then we encountered the issue. The node was back online as PSN1, but the connectivity status issue of the endpoints was faced . Some of the devices joined over PSN2 and their status was not correct. The devices itself were connected to the network. Yes, we assume that it is a db session synchronization issue, but we don't know how to solve it. This is in fact why i opened this topic, since an endpoint status is linked with license consumption. Either a cluster formation order suggestion or a fix for the db sync.

Thank you again Arne

Arne Bier · ‎01-06-2021

Am I correct in saying that you decided to run AAA services on the first two ISE nodes because the IP addresses were already used as AAA servers in the rest of your network (i.e. you replaced the last two RADIUS servers with ISE and kept the IP addresses)?

If not, and if you have the ability to go around and change the AAA details in your NAS's, then the better approach would be to build your ISE cluster in peace, and then move services onto nodes 5 and 6 (PSN 1 and PSN 2). Or is that a pipe dream ?

In your case, you stood up node 1 (PAN/Mnt/PSN) - that became the de-facto master database - and then registered node 2 (PAN/Mnt/PSN) - that became the de-facto secondary database)

Here's the sequence

Deploy Node 1 - Has Primary Admin, Primary MnT, Services

Register Node 2 - Assign Secondary Admin, Secondary Mnt, no Services

Register Node 3 - Assign Primary Mnt and nothing else (MnT is then removed from Node 1)

Register Node 4 - Assign Secondary Mnt and nothing else (MnT is then removed from Node 2)

Register Node 5 - Services only

Register Node 6 - Services only

Promote Node 1 to Primary Admin and Primary MnT - ensure no Services Persona running on Nodes 1-4

I can't remember whether you can register a node and assign it Primary Admin persona - I am thinking it's not possible - if it is possible then you could keep your current node 1/2 as PSN only after registering the remaining nodes and assigning then the relevant personas - someone like @Damien Miller might have a fresher perspective on it .