cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9499
Views
25
Helpful
1
Comments

Introduction

 

The purpose of this document is to present the different troubleshooting steps to take when some service from the Cisco IM & Presence Service Server have not started gracefully.

 

The States of a service

 

The IM&P Services have the following states:

 

Started - The service is active and running

Starting - The service is in the transition from Stop to Started

Stopped - The services are not started, could be because it was stopped manually or it is not activated.

Stopping - The service is in the transition from Started to Stop.

 

Always keep in mind that after a reboot of the IM&P node, the following alert will be generated: 

 

The Cisco IM and Presence Data Monitor has detected that database replication is not complete, and/or that the Cisco Sync Agent sync from Cisco Unified Communications Manager is not complete. Some services will remain in the "Starting" state until replication and the Cisco Sync Agent sync are completed.

 

The message not necessarily means that the services remain in the "Starting" state since the alert was generated.

This is expected as the IM&P Data Monitor will commence monitoring the services as soon as the IM&P comes up from a reboot or boot. The first thing that the Monitor Service will detect is that all the main services are in the process of being starting, which will trigger the message.

Run the command: utils service list to confirm that the services are actually Started, and if they are, feel free to delete the alert to keep the Notification Alerts clean

 

Identifying the problem

 

The first step to troubleshoot the services not Starting is to understand which services are the ones not started.

If most of the services are on Starting state or just some of them, once identified the services, you need to verify if they are dependant on one from the other or not. We will see this in more detail later.

 

It is important to validate the legend that appears on the right side of the services that are stopped, commonly we can identify:

  • Component not activated: This means that the feature service was not activated, and that requires to be done first.
  • Commanded out of service: This message appears after a reboot of the server and if HA was not disabled or if certain services were restarted, and that action caused the manual stoppage of other services. The solution here is to start the services manually either from the GUI or CLI.

 

Services remaining on starting state

 

One of the most common issues that are found on the IM&P Subscriber after a restart is to see almost all of the services on STARTING state, while the IM&P Publisher shows all the services started.

The common cause of this behaviour is given by a restart of the IM&P Subscriber without disabling the High Availability from the Presence Redundancy Groups.

The solution to fix this problem is the following:

 

Step 1. Disable HA

Step 2. Run the following command on both IM&P nodes

  • set replication-sync monitor disable
  • Keep in mind that this command is not service impacting and will only disable the monitor service between the nodes, which will allow the services to start.

 

Step 3. Wait around 5 minutes and run utils service list again to confirm that the services are now Started.

Step 4. Once all the services are Started on the Subscriber, you require to run the following command on both IM&P nodes:

  • set replication-sync monitor enable

 

Step 5. Re-enable the High Availability from the Presence Redundancy Groups

 

Specific services not starting

 

Network services not starting

 

Although uncommon, there have been scenarios where some network services do not start on the IM&P Publisher, these are:

 

  • Cisco Client Profile Agent
  • Cisco XCP Router
  • Cisco XCP Config Manager
  • Cisco Route and Presence Datastores

 

Impact: The XCP, Presence Engine and SIP Proxy services won't start, as those are dependant on the Network services listed. This will cause that the IMDB does not replicate and the Jabber users are unable to log in.

 

Solution:

 

Step 1. Disable HA

Step 2. Start manually each service in the following order:

  • Cisco Client Profile Agent
  • Cisco Route Datastore
  • Cisco Presence Datastore
  • Cisco XCP Config Manager
  • Cisco XCP Router

 

Keep in mind

 

  • For the Cisco Client Profile Agent to start, the Cisco Tomcat Service requires to be started.

 

If the previous steps have not worked, a TAC case should be opened for further troubleshooting. Keep in mind that the following outputs and logs will be required and that some traces will require to be set to debug before the reproduction of the issue, i.e., attempt to start the service manually.

 

  1. CLI Outputs
    1. show network cluster
    2. utils dbreplication runtimestate
    3. utils ha status
    4. utils core active list
    5. utils service list
  2. Logs/ Traces
    1. Cisco Syslog Agent
    2. Event Viewer-Application Log
    3. Event Viewer-System Log
    4. Any of the traces from the services that remain stopped.

 

 

A Cisco DB not starting

 

This is one of the main services within the system.

 

Impact: Cannot access certain features on the Server webpage, Jabber users and their features might get compromised, IDS DB replication gets broken.

 

Causes: The most common causes identified for this issue are:

  • Change of the hostname, IP address or domain without following the Cisco Guidelines
  • Corruption of the files after an ungraceful shutdown of the system

 

Solution: Unfortunately, there are no straight solution steps for this service not starting. The suggestions are:

 

Step 1. Disable HA

Step 2. Restart A Cisco DB replicator

Step 3. Restart A Cisco DB, if it remains in starting state, try stopping it and then starting it.

 

The best approach here is to engage Cisco TAC for further investigation, and the following information will be required for them:

 

  1. CLI Outputs:
    1. show tech network hosts
    2. show tech database dump
    3. show tech dbintegrity
    4. utils create report database
    5. utils network connectivity IM&P_node 1500
    6. Show network cluster
    7. utils core active list
  2. Logs or Traces:
    1. Cisco Database Layer Monitor
    2. Cisco Database Library Trace
    3. Cisco Database Notification Service
    4. Cisco Database Replicator Trace
    5. Cisco Informix Database Service
    6. Cisco Syslog Agent
    7. Event Viewer-Application Log
    8. Event Viewer-System Log
  3. Inform about the recent changes made.

 

Cisco Intercluster Sync Agent not starting

 

Impact: The IM&P database won't be synchronized across the IM&P nodes and IM&P clusters (Inter-cluster peering)

 

If the ICSA service is not starting, there are possible 3 main reasons why it is not:

 

  1. The High Availability is in a bad (or wrong) state and it is not allowing the service to come up.
    1. You will need to disable HA, start the service and then re-enable the HA.

 

  1. You might be hitting one of these two bugs:
    1. https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvj09515/?rfs=iqvred
    2. https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvq63308

 

If the restart of the node or reason 1 does not help the service to come up, a TAC case should be opened for further troubleshooting. Keep in mind that the following outputs and logs will be required and that some traces will require to be set to debug before the reproduction of the issue, i.e., attempt to start the service manually.

 

  1. CLI Outputs
    1. show network cluster
    2. utils dbreplication runtimestate
    3. utils ha status
    4. utils core active list
    5. utils service list
  2. Logs/ Traces
    1. Cisco Syslog Agent
    2. Event Viewer-Application Log
    3. Event Viewer-System Log
    1. Cisco Service Recovery Manager
    2. Cisco Intercluster Sync Agent Service

 

Presence Engine not starting

 

For the Cisco Presence Engine service, there are several variants that we should be taking into account to understand why it is not starting and how to make it start.

 

  1. Review via the utils service list command that the following services are running, if they are not, they require to be started first:
    1. Cisco Presence Datastore
    2. Cisco SIP Proxy
    3. Cisco XCP Router
    4. Cisco Sync Agent

 

  1. The most common reason the Cisco Presence Engine (PE) service does not start in the IM&P Subscriber, is because the IM&P Subscriber has not been added to the presence redundancy group (PRG).
    • Reason: The PE Service is tied to the PRG and requires to be added to start.
    • Solution: Add the server to the PRG and wait around 5 minutes to see if it starts.
    • Variants: It is probable that after performing that, the PE stops on both IM&P nodes, and the solution here will be the following:

 

Step 1. Keep the IM&P Sub in the PRG

Step 2. Disable HA

Step 3. Restart first the Cisco SIP Proxy Service, wait until it starts.

Step 4. Restart the Cisco PE service, wait until it starts.

Step 5. Steps 3 and 4 are required to be performed on the IM&P Publisher first, then on the subscriber.

 

2. If the IM&P Subscriber is already added into the PRG, and the PE remains in stopped or starting state, that could be related to a mismatch in the DB Replication between the two IM&P nodes, and the following query should be executed: run sql select * from enterprisenode.

What that query will display is the id of the node, the subclusterid of the node (which is the PRG id), name or IP address and other values. What we want to focus on, is that both IM&P nodes share the same subclusterid value.

  • Reason: If the DB Replication did not perform correctly, the IM&P Subscriber will be displaying the subclusterid as NULL.
  • Solution: Run the following query:  run sql update enterprisenode set subclesterid=subclesterid_value_as_for_the_IM&P_Pub where id=IM&P_Sub_id, then rerun run sql select * from enterprisnode node query and make sure the subclusterid has the correct value (the same) for both IM&P nodes. The service might start on its own in the next 5 minutes, or you can try to start it manually.
  • Recommendation: Open a Cisco TAC case to perform this change.

  

3. If all services are started, except for the PE, and steps 1 and 2 were verified:

    1. Solution:

Step 1. Run set replication-sync monitor disable on both IM&P nodes

Step 2. Wait around 5 minutes and if not started, attempt to start the service manually utils service start Cisco Presence Engine

Step 3. Run set replication-sync monitor enable (either if the service started or not)

 

4. If steps 1, 2 or 3 did not help to make the PE service start, we might be facing either two scenarios that will require the access of the remote account of the server to validate.

Scenario 1: Validate the PE process.

Scenario 2: If you are running version 12.5 it is highly probable to be hitting the following defect: CSCvg94247

 

Therefore, perform the following:

 

Step 1. Make sure the Cisco Presence Engine Service is set to debug.

Step 2. Attempt steps 1, 2 and 3.

- And if you find discrepancies with step 2 you might want to be TAC on the call.

Step 3. If after the above steps, the service remains in starting state, collect the following logs for the timeframe of the attempt, and engage Cisco TAC:

  1. Cisco Presence Engine
  2. Event viewer application and system logs
  3. And also collect the following outputs from CLI:
    • Utils service list
    • Utils dbreplication runtimestate
    • Utils imdb_replication status
    • Utils diagnose test
    • run sql select * from enterprisenode

 

Cisco Sync Agent not starting

 

Impact: Synchronization of DB Tables from CUCM to IM&P will not be completed, impacting mainly the end-user synchronization across the cluster.

 

Solution: Review the following checklist.

 

  1. Verify that both CUCM and IM & Presence nodes are running the same version. If you are using version 11.X or later, the servers require to be running on the same SU version.
    • If they are not, make sure that both run the same version.
  2. Verify AXL Web Service on CUCM is running
    • If it is not, start the AXL Web Service
  3. Verify that the IM & Presence node is listed in the Server List on CUCM
    • If it is not, a rebuild of the IM&P Server will be needed. Adding back the server list entry will not take any effect, as a specific ID is generated for every entry added, thus the IM & Presence will remain with an old one.
  4. Verify within the troubleshooter tests on the CUCM Publisher page on IM&P are passing
    • Different troubleshooting should be followed depending on the errors seen.
  5. Verify that the following URL is reachable https://CUCM_OR_IM&P_FQDN_OR_IP/axl/
  6. Attempt rebooting the CUCM publisher and then the IM&P Publisher.
    • Keep in mind that HA requires to be disabled before performing this step.
  7. Run the following CLI query on the IM&P Publisher:
  • run sql select * from epassyncagentcfg
  1. Confirm that the ccmpublisherip address being displayed is from the CUCM Publisher
  2. Run the following query on the CUCM:
  • run sql select applicationuser.pkid, applicationuser.name , credential.credentials from applicationuser inner join credential on applicationuser.pkid=credential.fkapplicationuser where credential.tkcredential=3 and applicationuser.name='axluser_displayed_from_epassyncagentcfg'
  1. Confirm there that the:
    • username (On CUCM) = axluser (On IM&P)
    • pkid (On CUCM) = cucm_axluser_pkid (On IM&P)
    • credentials (On CUCM) = axlpassword (On IM&P)
  2. If the axluser in epassyncagentcfg cannot be found in the CUCM user list, then create a new application user on the CUCM side same as the old axluser providing the previous password, if known.

 

If the above actions do not help to solve the problem, you will need to engage Cisco TAC for further troubleshooting. Keep in mind that the following outputs and logs will be required and that some traces will require to be set to debug before the reproduction of the issue, i.e., attempt to start the service manually.

 

  1. CLI Outputs (from CUCM Publisher and IM&P)
    1. Show network cluster
    2. Utils dbreplication runtimestate
    3. Utils ha status
    4. Utils core active list
    5. Utils service list
    6. run sql select * from epassyncagentcfg (Only on the IM&P)
    7. run sql select applicationuser.pkid, applicationuser.name , credential.credentials from applicationuser inner join credential on applicationuser.pkid=credential.fkapplicationuser where credential.tkcredential=3 and applicationuser.name='axluser_displayed_from_epassyncagentcfg' (Only on the CUCM)
  2. Logs/ Traces
    1. Cisco Syslog Agent
    2. Event Viewer-Application Log
    3. Event Viewer-System Log
    4. Cisco Sync Agent
    5. Cisco AXL Web Service

 

Feature Services not starting

 

These services which are: Cisco XCP Directory Service, Cisco XCP File Transfer Manager, Cisco XCP Message Archives and Cisco XCP XMPP Federation, are disabled by default unless you used the feature of each service.

Even though your IMP has those services as activated, they won’t start unless you configure each feature for each service.

 

For instance:

 

Cisco XCP Directory Service

 

The Cisco XCP Directory Service supports the integration of XMPP clients with the LDAP directory to allow users to search and add contacts from the LDAP directory.

To start this service you need to configure LDAP search settings for third-party XMPP clients (Choose Cisco Unified CM IM and Presence Administration > Application > Third-Party Clients > Third-Party LDAP Settings.)

You use Cisco XCP Directory Service to allow users of a third-party XMPP client to search and add contacts from the LDAP directory.

If you turn on the Cisco XCP Directory Service, but you do not configure the LDAP server, and LDAP search settings for third-party XMPP clients, the service will start, and then stop again as in your case.

To configure third-party XMPP directory:

https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/im_presence/configAdminGuide/10_0_1/CUP0_BK...

 

Cisco File transfer Manager:

 

This service allows you to use a server-side file transfer solution called managed file transfer.

MFT allows an IM and Presence service client, such as Cisco Jabber to transfer files to other users, ad hoc group chats and persistent chat.

The service will not start if the configuration for MFT is not in place.

To activate and use MFT:

https://community.cisco.com/t5/collaboration-voice-and-video/how-to-configure-managed-file-transfer-...

 

Cisco XCP Message Archiver

 

The Cisco XCP Message Archiver service supports the IM Compliance feature. The IM Compliance feature logs all messages sent to and from the IM and Presence server, including point-to-point messages, and messages from adhoc (temporary) and permanent chat rooms for the Chat feature. Messages are logged to an external Cisco-supported database.

The service will not start if the configuration for compliance is not in place.

How to use message archiver:

https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/im_presence/im_compliance/9_1_1/CUP0_BK_I8F...

 

Cisco XCP XMPP Federation Connection Manager

 

The Cisco XCP XMPP Federation Connection Manager supports interdomain federation with third party enterprises such as IBM Lotus Sametime, Cisco Webex Meeting Center, GoogleTalk, and another IM and Presence enterprise, over the XMPP protocol.

Again this service won’t start until XMPP federation is configured.

How to configure XMPP federation:

https://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/im_presence/interdomain_federation/9_0_1/CU...

Comments
miket
Level 5
Level 5

I had issues with SYNC Agent would not start. It turned out to be the root/intermediate certs had expired. Uploaded new ones and started right away

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: