cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
225
Views
3
Helpful
8
Replies

ISE Secondary node sync up failed on 3.4 after automatic PAN failover

francois-smith
Level 1
Level 1

We have a 3 node deployment ISE1-PAN (Admin, MNT, PSN),   ISE2-SPAN (Admin, MNT, PSN)  and ISE3-HEALTH (Health node)

To test the failover we shut the network ports on the ISE1-PAN, after the configured polling intervals the ISE3-HEALTH triggered the failover to ISE2-SPAN and failover completed successfully. We then unshut the ports on ISE1-PAN went to the process of becoming the was secondary PAN, replication stopped and we did a full sync of the server which was successful. After ISE1-PAN was connected successfully and all the nodes in the deployment was synced and connected.


We then triggered another failover by shutting the ports on the ISE2-SPAN that was now primary so that we can test the automatic PAN failover from new PAN(ISE2-SPAN) back to ISE1-PAN as the primary. Everything appeared to work well, the ISE1-PAN became primary again as before and ISE2-SPAN became secondary. However when we attempted to do full sync on the ISE2-SPAN the sync action did not start as it did when we did the same with ISE1-PAN. The application services on ISE2-SPAN did restart but after they all came back up the server still was in synced and still said a manual sync is required.

We 1st restarted the node, but that made no difference and eventually we attempted to de-register ISE2-SPAN from the deployment. However when we attempted to register the node back to the deployment we got this meesage:

Certificate hierarchy must terminate with certificate in trusted store :

ISE2-SPAN.domain.com

We have checked that the Trusted Certificates and System Certificates on the current PAN ISE1-PAN and ISE2-SPAN are the same, but cannot figure to what the message is referring to.

Can anyone guide us in the a direction of what this message means, what cert do we need to import and to which server

 

8 Replies 8

Why do you have PAN auto-failover enabled? 

It is the design of the deployment architecture. We are still in pre-production and busy testing the failover scenarios and this what happened during testing. We now want to recover back to where all the nodes are back and in sync.

ISE1-PAN is operational and ISE2-HEALTH is operational in the deployment, but we are unable to get the ISE2-SPAN registered. It is currently in standalone mode, with the configs previously that was sync when it was successfully registered

Why? What exactly are you hoping to gain by using that feature?

francois-smith
Level 1
Level 1

The purpose is to test the feature to see how it operates and determine if it is suitable for our environment and if not use the manual failover procedure. We have now encountered this issue and we need some guidance on how to recover back to the previous state and test the manual failover procedure. 

Do have you any guidance of how we may recover so that we can disable the feature and test the manual failover?

What patch level of 3.4 is this?

francois-smith
Level 1
Level 1

Patch 3

Arne Bier
VIP
VIP

@francois-smith - whether or not automatic PAN failover is good or bad, that's up to the individual - my personal opinion is that it's not worth it, because it doesn't solve any problems, unless ISE also had a built-in FHRP/VIP concept that allowed me to always hit the same IP/DNS. Anyway ..

The issue you describe sounds like corrupted config that smells like a TAC case. My own experience with 3.4p3 is that after a 3.3 to 3.4 upgrade, all was well while I was on SPAN, and then to conclude the upgrade, I manually promoted PPAN back to Primary. That's when the wheels came off. The PPAN was active, and all other nodes had a red icon against them. I waited but they didn't recover. I had to sync each one manually - and even after that, I now have daily sync failures on all nodes, that the TAC say are "cosmetic" - that gold star that the BU assigns to software versions means very little in my opinion.  The solution works, but it doesn't give you the impression that it deserves any gold stars.

Hi @francois-smith ,

 every time I encounter these types of errors, I prefer to reset the ISE Application configuration to factory defaults, including the Certificates:

ise/admin# application reset-config ise
Initialize your Application configuration to factory defaults? (y/n): Y
Leaving currently connected AD domains if any...
Please rejoin to AD domains from the administrative GUI
Retain existing Application server certificates? (y/n): N
...

Note 1: to be done in your SPAN (which is now a Standalone Node), before re-registering to the PPAN Cluster.

Note 2: this procedure always cleans "invisible garbage"  :  )

 

Hope this helps !