cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5378
Views
9
Helpful
14
Replies

pxGrid fails to start - After upgrade to v2.4

nspasov
Cisco Employee
Cisco Employee

Hello ISE experts-

I just upgraded my lab to ISE 2.4 (From 2.3 with the latest patch). After the upgrade, the system does not allow me to enable/start the pxGrid service. When I try to enable it, I get the following error:

Node edit failed: Could not enable PxGrid as there was a problem in importing PxGrid wallet certificate from NS-ISE-B1. Please check the node and try again.

I get the above error on both of my nodes. Here is what I have tried so far:

- Restarted both ISE nodes

- Checked internal documentation

- Generated a self-singed certificate

- Issued a CA (Microsoft CA) signed certificate

- Issued an internal (ISE CA) signed certificate

Thank you in advance!

1 Accepted Solution

Accepted Solutions

hslai
Cisco Employee
Cisco Employee

Are you using mixed cases in the hostname or the domain name? If so, the symptom looks similar to CSCvg38371.

You may either open a TAC case or unicast me a copy of your ISE CFG backup so I can try recreating your issue.

[2018-05-02] CSCvj31112 is a new bug filed for ISE 2.4.

View solution in original post

14 Replies 14

hslai
Cisco Employee
Cisco Employee

Are you using mixed cases in the hostname or the domain name? If so, the symptom looks similar to CSCvg38371.

You may either open a TAC case or unicast me a copy of your ISE CFG backup so I can try recreating your issue.

[2018-05-02] CSCvj31112 is a new bug filed for ISE 2.4.

nspasov
Cisco Employee
Cisco Employee

Thank you for the tip! That was the exact problem! The hostname was in uppercase while the domain was in lowercase. Changing the hostname to lowercase and re-issuing the certificate fixed the problem!

Btw, I think you referenced the wrong bug id. I think the right one is CSCvc62414 which for some reason does not offer a "workaround"

Neno

Did you simply change the hostname of the ISE node from the CLI and then create new certificates for pxgrid?

Yes, here are the exact steps that I performed on my 2 node deployment:

1. De-registered the secondary node

2. Made both nodes standalone

3. In CLI, changed the hostnames of both nodes to be in lower case

4. Made node #1 to Primary

5. Registered #2 as Secondary

6. Re-issued pxGrid certificates

7. Enabled the pxGrid service

I did the change and I am able to start the pxGrid service on my secondary node.

But it disables the pxGrid service on the primary node, which leaves only one running.

Could this be an licenses issue - didn't recall reading about this in the release notes for ISE 2.4.

I just checked my lab deployment and can confirm that pxGrid is running on both the primary and the secondary node. So what happens when you try to enable it on the primary node?

On node 1 pxGrid services are disabled, and on node 2 pxGrid services are running. In the gui both nodes are configured with pxGrid services, despite that sh application staatus ise, tells a different story.

If I then disabled the services on node 1, and re-enables it, the services starts on node 1, and goes into disabled on node 2.

Hmm, that sounds like a different issue. Have you looked at the pxgrid logs for more clues? It is probably worth opening a TAC case as well.

I've just added at secondary ISE node to my home lab, and the result is the same. Both Nodes deployed from the 2.4 code.

Did you verify pxgrid services via the cli? I'm asking because both node does have pxgrid enabled ind the GUI!

I'll deploy a new setup just to verify the behavioure...

I've tested it on a new deployment, and the results are the same. Here are the steps I've been through.. I've used the default certificates that is generated during installation...

- deployed ova to vmware

- make initial config in cli

- made dns records, and reverse

- made node 1 primary

- registered secondary node

- waited for sync....

- enabled pxGrid on both nodes...

- verified pxgrid service status > running on node 1, disabled on node 2

- re-enabled it on node 2, waited, verified pxgrid service status > running on node 2, disabled on node 1

I went through the logs on the secondary node and in pxgrid-server.log, there are several entries, ERROR > Thread-68][] com.cisco.pxgrid.Configuration -:::::- Failed to connect to host The following addresses failed: 'ise1.***.dk:5222' failed because java.net.ConnectException: Connection refused (Connection refused)ise1.*****.dk


On the primary node, in pxgrid-cm.log, there are log entries the states that the listening port were closed >

[2018-05-12T20:45:53Z] [INFO ] [cm-1.jabber] [Resolver.cpp:127] [] Starting resolver lookup for '127.0.0.1:puny=127.0.0.1:service=(nil):defport=7400'

[2018-05-12T20:45:53Z] [INFO ] [cm-1.jabber] [Resolver.cpp:355] [] res_querydomain for '127.0.0.1:puny=127.0.0.1:service=(nil):defport=7400' took 0.000113s

[2018-05-12T20:45:53Z] [INFO ] [cm-1.jabber] [Resolver.cpp:260] [] getaddrinfo for '127.0.0.1:puny=127.0.0.1:service=7400:defport=7400' took 0.000002s

[2018-05-12T20:45:53Z] [INFO ] [cm-1.jabber] [Resolver.cpp:142] [] Finished resolver lookup for '127.0.0.1:puny=127.0.0.1:service=7400:defport=7400'. Took 0.000219s

[2018-05-12T20:45:54Z] [INFO ] [cm-1.jabber] [SocketWatcher.cpp:449] [] Creating Listening Socket: 0x00007ff9d4004860 IP: 0.0.0.0 Port: 5222

[2018-05-12T20:45:54Z] [INFO ] [cm-1.jabber] [Resolver.cpp:127] [] Starting resolver lookup for '0.0.0.0:puny=0.0.0.0:service=5222:defport=0'

[2018-05-12T20:45:54Z] [INFO ] [cm-1.jabber] [Resolver.cpp:260] [] getaddrinfo for '0.0.0.0:puny=0.0.0.0:service=5222:defport=0' took 0.000025s

[2018-05-12T20:45:54Z] [INFO ] [cm-1.jabber] [Resolver.cpp:142] [] Finished resolver lookup for '0.0.0.0:puny=0.0.0.0:service=5222:defport=0'. Took 0.000198s

[2018-05-12T20:45:54Z] [INFO ] [cm-1.jabber] [JSMCommandProcessor.cpp:1488] [] No session state data available, sending item-not-found to host 'jsm-1.jabber'

[2018-05-12T20:45:54Z] [INFO ] [cm-1.jabber] [SASLManager.cpp:198] [] Failed to query auth component for SASL mechanisms

[2018-05-12T20:46:23Z] [INFO ] [cm-1.jabber] [SASLManager.cpp:198] [] Failed to query auth component for SASL mechanisms

[2018-05-12T20:48:54Z] [INFO ] [cm-1.jabber] [cm-1_jsmcp-1.jabber] [] Got a FINALIZE command from jabberd.

[2018-05-12T20:48:54Z] [INFO ] [cm-1.jabber] [BasicSocket.cpp:477] [] Closing Listening Socket: 0x00007ff9d4004860, IP: 0.0.0.0 Port: 5222

[2018-05-12T20:48:54Z] [INFO ] [cm-1.jabber] [cm-1_jsmcp-1.jabber] [] Sending jabberd a command ack.

[2018-05-12T20:48:54Z] [INFO ] [cm-1.jabber] [cm-1_jsmcp-1.jabber] [] Got a SHUTDOWN command from jabberd.

[2018-05-12T20:48:54Z] [INFO ] [cm-1.jabber] [cm-1_jsmcp-1.jabber] [] Sending jabberd a command ack.

[2018-05-12T20:48:54Z] [INFO ] [cm-1.jabber] [cm] [] ConnectionManager removing processor "cm-1_jsmcp-1.jabber"

[2018-05-12T20:49:02Z] [INFO ] [cm-1.jabber] [cm] [] Router connection is shutting down.

Please engage Cisco TAC, if not done already, as it working ok for Neno.

If these entries on your primary ISE node after pxGrid enabled, it appeared that the pxGrid on the primary ISE having an issue and you would likely either see pxGrid not connected in the pxGrid services page on ISE or unable to connect an external pxGrid client. Attached zip file containing a sample logs from one of the lab pods to give you an idea what the logs looking when it running ok.

If you are using the EVAL ova file, then the memory is too low to be an primary ISE node and other services. Please increase the resource to at least 16 GB RAM and 4 CPU cores. To serve more services would need more memory.

both my lab nodes have 16 GB of memory, and 4 cores, but CPU reservation have been disabled.

On the pxGrid services page, everything looks good, but not in the CLI it don't. Look at the snippet to see...pgrid.PNG

I'm going to deploy 2.3 later today in my lab.

Reading through pxGrid configuration documentation, this bahaviour is default >

https://www.cisco.com/c/en/us/td/docs/security/ise/2-4/admin_guide/b_ise_admin_guide_24/b_ise_admin_guide_24_new_chapter_011.html?bookSearch=true

pxGrid Node You can use Cisco pxGrid to share the context-sensitive information from Cisco ISE session directory with other network systems such as ISE Eco system partner systems and other Cisco platforms. The pxGrid framework can also be used to exchange policy and configuration data between nodes like sharing tags and policy objects between Cisco ISE and third party vendors, and for other information exchanges. pxGrid also allows 3rd party systems to invoke adaptive network control actions (EPS) to quarantine users/devices in response to a network or security event. The TrustSec information like tag definition, value, and description can be passed from Cisco ISE via TrustSec topic to other networks. The endpoint profiles with Fully Qualified Names (FQNs) can be passed from Cisco ISE to other networks through a endpoint profile meta topic. Cisco pxGrid also supports bulk download of tags and endpoint profiles. You can publish and subscribe to SXP bindings (IP-SGT mappings) through pxGrid. For more information about SXP bindings, see Security Group Tag Exchange Protocol. In a high-availability configuration, Cisco pxGrid servers replicate information between the nodes through the PAN. When the PAN goes down, pxGrid server stops handling the client registration and subscription. You need to manually promote the PAN for the pxGrid server to become active. You can check the pxGrid Services page (Administration > pxGrid Services) to verify whether a pxGrid node is currently in active or standby state.

For XMPP (Extensible Messaging and Presence Protocol ) clients, pxGrid nodes work in Active/Standby high availability mode which means that the pxGrid Service is in "running" state on the active node and in "disabled" state on the standby node.


After the automatic failover to the secondary pxGrid node is initiated, if the original primary pxGrid node is brought back into the network, the original primary pxGrid node will continue to have the secondary role and will not be promoted back to the primary role unless the current primary node goes down.

Seems kinda *!"#"# when FMC connects to both pxGrid nodes during setup for testing

Correct. The HA for pxGrid V1 (XMPP) is active-standby whereas pxGrid V2 (WebSocket) active-active. V2 is pretty new but we hope soon to be adapted.