cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
430
Views
0
Helpful
5
Replies

Post 3.4, Patch 4, replication stop with PAN and CLI failed to connect

anilraj_003
Level 1
Level 1

Post 3.4, patch 4, Replication stop between PAN and PSNs, error, Jediss replication failed, CLI access issue-error failed to connect to the server. throwing an error in the debug log : 

Error, Failed to connect to server, could not connect to test-ise-01.net.lab/198.XX.XX.01:12001

replication error: from psn debug:-FullSync:- Primary address is null

 

5 Replies 5

@anilraj_003 

 please

 check the 12001 ports on both Nodes:

ise/admin# show ports | include 12001
...
198.xx.xx.01:12001,
...

 

 check the Services for any initializing or not running State:

ise/admin# show application status ise

ISE PROCESS NAME STATE PROCESS ID
--------------------------------------------------
Database Listener running 8436
Database Server running 203 PROCESSES
Application Server running 27824
...

 

Check if there is anything blocking the communication between the Nodes.

 

For Cisco ISE port reference:

Cisco ISE Installation Guide - Release 3.4 - Port Reference

 

Hope this helps !

 

anilraj_003
Level 1
Level 1

Thanks for the input. In our case, CLI access is not possible on any node (SSH / CIMC / serial all drop immediately after the login prompt), so we are unable to run 'show application status ise' or verify ports locally. We have a total of 15 Physical nodes, SNS36XX, all of which we can say are technically dead.

Problem : This is not a replication problem — this is an OS / shell / service-layer collapse on all 15 nodes after 3.4 Patch 4, where CLI sessions cannot start, PAN replication services are not responding, and the cluster is effectively brain-dead.

PSN logs confirm repeated connection attempts to PAN on port 12001, but the PAN replication service is not responding, and the Primary address becomes null. This is being investigated with TAC as an OS-level issue requiring recovery.

That’s it.
No back-and-forth needed.

@anilraj_003 ,

 interesting ... I'm a bit curious, no CLI access via SSH or Console, correct ?

 Are you able to remove 2x Nodes from your 15-Node Cluster, the SPAN and a PSN, to create a Small Deployment, and test the replication of this new Cluster ?

Note 1: if the answer is yes, the SPAN will be the PPAN of the new Cluster.

Note 2: I'm thinking of testing whether the problem was specific to PPAN or also to SPAN. If SPAN is OK, then you can rebuild your entire Cluster using SPAN.

 

Please keep us posted about the TAC investigation !

 

Best regards

 

anilraj_003
Level 1
Level 1

Thanks for the suggestion. Unfortunately, in our case, CLI access is not available(lost) on any node (P-PAN, S-PAN, PSNs, pxGrid, MNT). SSH, console, and CIMC KVM all exhibit the same behaviour where authentication succeeds, but the shell session immediately closes. I can't share a screenshot on this platform, but cli output is saying " failed to connect server" after passing login credentials.

We already attempted a PAN role switch to S-PAN as a new P-PAN was introduced, but not help. This indicates the issue is not specific to the P-PAN role but is systemic across the cluster. Saw replication log collected from debug: "org.jgroups.protocols.TUNNEL -:::::- Failed connecting to GossipRouter at pan-test.com/192.168.1.1:12001"

We are currently working with Cisco TAC/BU/Engineering, who are investigating this as an OS-level / recovery scenario. We’ll share updates once TAC completes the analysis.

@anilraj_003 ,

 thanks for your feedback. Please keep us posted !

 

Note: I've had issues in the past that would occur only in a Distributed Deployment. When I removed the SPAN from the Cluster and it became a Standalone, the problem was fixed, and from that point on I could recreate the Cluster via this SPAN.