cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Walkthrough Wednesdays

Troubleshooting IM&P Presence Topology shows services as UNKNOWN

265
Views
5
Helpful
0
Comments

Introduction

The purpose of this document is to provide troubleshooting insight into a well-known issue for the IM&P nodes when going to the Presence Topology Page and seeing all the services as Unknown, even though they are started as shown on the CLI via utils service list

 

The Issue

When we go to the IM&P Administration webpage -> System -> Presence Topology to verify the health status of the server, it is possible that we might encounter that our Server is in a bad state, i.e., the server will display a white cross within a red circle.

 

Here we will note the most common reasons we see those errors on the Presence Topology webpage.

 

Services Shown their status as Unknown

By clicking on view of one of the affected nodes, we can see the following errors on the webpage: the status of the services is unknown:

imp2.png

 

However, by going to the SSH/CLI Session of the IM&P Server and running the command: utils service list we will see that all those services are actually running (Started state).

 

What actions require to be taken?

The error on the GUI is associated with a Tomcat certificate issue, and what requires to be verified is the following:

 

  1. Make sure that all your Tomcat and Tomcat-trust certificates are not expired, otherwise, those will need to be regenerated.
  2. If your server is using CA-Signed certificates, you need to validate that the whole Tomcat chain is complete, which means, that the intermediates and Root certificates require to be uploaded as Tomcat-trust to complete the chain if missing.

    Here is an example of a missing Certificate in the Tomcat chain. In this case, the Tomcat certificate chain consists of only 2 certificates: Root -> Leaf, however, there are scenarios where more than 2 or 3 certificates build the chain.

     

    imp3.png

     

    In the image example, the Issuer: mexrus-TENOCHTITLAN-CA is the certificate missing.

Logs required for troubleshooting

Navigate to IM and Presence Serviceability > Trace > Trace Configuration > Server to select: IM&P Publisher > Service Group > Database and Admin Services > Service: Cisco IM and Presence Admin > Apply to all Nodes > Debug level: Debug > Check the Enable All Trace Checkbox > Save

Navigate to IM and Presence Administration > System > Presence Topology > Select the node that is affected by the unknown services (And note the timestamp) 

 

Go to RTMT and gather the following logs:

 

  • Cisco Syslog
  • Cisco Tomcat
  • Cisco Tomcat Security
  • Event Viewer Application Logs
  • Event Viewer System Logs
  • Cisco IM and Presence Admin logs

 

What to expect in the logs:

 

From the cupadminX.log

 

Accessing the Presence Topology > Node panel

 

2021-01-23 17:54:57,036 DEBUG [Thread-137] logging.IMPCommonLogger - IMPSocketFactory: Create socket called with host tenochtitlanIMP.mexrus.ru and port 8443
2021-01-23 17:54:57,040 DEBUG [Thread-137] logging.IMPCommonLogger - Enabled protocols: [TLSv1.1, TLSv1, TLSv1.2]

Exception got because a certificate was not verified.

2021-01-23 17:54:57,087 ERROR [Thread-137] services.ServiceUtil - Got an exception setting up the HTTPS connection.
javax.net.ssl.SSLException: Certificate not verified.
at com.rsa.sslj.x.aH.b(Unknown Source)
at com.rsa.sslj.x.aH.a(Unknown Source)
at com.rsa.sslj.x.aH.a(Unknown Source)
at com.rsa.sslj.x.ap.c(Unknown Source)
at com.rsa.sslj.x.ap.a(Unknown Source)
at com.rsa.sslj.x.ap.j(Unknown Source)
at com.rsa.sslj.x.ap.i(Unknown Source)
at com.rsa.sslj.x.ap.h(Unknown Source)
at com.rsa.sslj.x.aS.startHandshake(Unknown Source)
at com.cisco.cup.services.ServiceUtil.init(ServiceUtil.java:118)
at com.cisco.cup.services.ServiceUtil.getServiceInfo(ServiceUtil.java:197)
at com.cisco.cup.services.ServiceUtil.getServiceInfo(ServiceUtil.java:182)

Attempting to retrieve the Node Status for the Topology

 

at com.cisco.cup.admin.actions.TopologyNodeStatusAction$ServiceRunner.run(TopologyNodeStatusAction.java:358)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.rsa.sslj.x.aK: Certificate not verified.
at com.rsa.sslj.x.bg.a(Unknown Source)
at com.rsa.sslj.x.bg.a(Unknown Source)
at com.rsa.sslj.x.bg.a(Unknown Source)
... 13 more

 

The exception is caused due to the missing issuer of the Tomcat Certificate

 

Caused by: java.security.cert.CertificateException: Issuer for signed certificate [CN=tenochtitlanCM-ms.mexrus.ru,OU=Collab,O=Cisco,L=Mexico,ST=Mexico City,C=MX] not found: CN=mexrus-TENOCHTITLAN-CA,DC=mexrus,DC=ru
at com.cisco.cup.security.TLSTrustManager.checkServerTrusted(TLSTrustManager.java:309)
at com.rsa.sslj.x.aE.a(Unknown Source)
... 16 more

2021-01-23 17:54:57,087 DEBUG [Thread-137] actions.TopologyNodeStatusAction$ServiceRunner - Retrieved service status for node tenochtitlanIMP.mexrus.ru
2021-01-23 17:54:57,088 DEBUG [http-bio-443-exec-8] actions.TopologyNodeStatusAction - [Topology] VerifyNodeServices - Complete.

Another possibility is that can be found on the same CUPADMIN logs is the following:

 

Caused by: java.security.cert.CertificateException: Incorrect issuer for server cert
                at com.cisco.cup.security.TLSTrustManager.checkServerTrusted(TLSTrustManager.java:226)
                at com.rsa.sslj.x.aE.a(Unknown Source)
                ... 16 more
2017-10-14 09:04:01,667 ERROR [Thread-125] services.ServiceUtil - Failed to retrieve service status. Reason: Certificate not verified.
javax.net.ssl.SSLException: Certificate not verified.

 

In this case, the IM&P is not recognizing the Issuer certificate for the Tomcat as a valid Issuer certificate, which most probably was caused due to a corrupted certificate, options here are:

 

  • Validate the information presented on both: Tomcat and Issuer certificates.
  • Try deleting the issuer certificate from the IM&P and upload it again.
  • Regenerate the Tomcat Certificate and restart the Tomcat Service
    • Keep in mind that if the Tomcat is CA-Signed, the resing process should be needed.

 

Can it be due to a defect?

 

Lastly, be aware of the following defect: https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvu78005

  1. Run the utils diagnose test command on the affected node. 
  2. You will require to engage Cisco TAC for further assistance