Prime NCS Failed to start

wirelessman · ‎06-08-2015

Hello, experts!!!

2 Prime NCS'es setup with HA

both suddenly went down with the following ncs stat message after 3x retarts:

Cisco-NCS-Pri/admin# ncs stat
Health Monitor is running, with an error.
initHealthMonitor(): can not start DB
Ftp Server is Stopped
Database server is stopped
Tftp Server is Stopped
Matlab Server is Stopped
NMS Server is stopped.
CNS Gateway with port 11011 is down
CNS Gateway SSL with port 11012 is down
CNS Gateway with port 11013 is down
CNS Gateway SSL with port 11014 is down
Plug and Play Gateway Broker with port 61617 is down
Plug and Play Gateway config, image and resource are down on https
Plug and Play Gateway config, image and resource are down on http
Plug and Play Gateway is stopped.
SAM Daemon is not running ...
DA Daemon is not running ...
Syslog Daemon is not running ...
Compliance engine is not running
Cisco-NCS-Pri/admin#

what can be done to restore it to its last known state?

gohussai · ‎06-08-2015

Found the following link please check.

http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/1-2/user/guide/prime_infra_ug/maint_sys_health.html#wp1070549

http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/2-0/administrator/guide/PIAdminBook/config_HA.html

wirelessman · ‎06-09-2015

We have made backup of the *.gpg files including the most recent ones.

If I issue the ncs db reinitdb:

1. Will the following stopped services be restored to operational states:

Health Monitor is running, with an error.

runHealthMonitor(): failed to start Database

Ftp Server is Stopped

Database server is stopped

Tftp Server is Stopped

Matlab Server is Stopped

NMS Server is stopped.

CNS Gateway with port 11011 is down

CNS Gateway SSL with port 11012 is down

CNS Gateway with port 11013 is down

CNS Gateway SSL with port 11014 is down

Plug and Play Gateway Broker with port 61617 is down

Plug and Play Gateway config, image and resource are down on https

Plug and Play Gateway config, image and resource are down on http

Plug and Play Gateway is stopped.

SAM Daemon is not running ...

DA Daemon is not running ...

Syslog Daemon is not running ...

Compliance engine is not running

2. Will the maps and the plotted APs be available again?

3. Will the discovered devices’ backup files still be intact?

BR’s,

Neil

mohanak · ‎06-09-2015

PI 2.2 not coming up after restoring the backup from 2.1

CSCut40331

Description

Symptom:
NMS process not starting after restoring the PI 2.1 backup.
PI22-pro-234/admin# ncs status
Health Monitor is running, with an error.
failed to start PI on startup Health Monitor
Matlab Server Instance 1 is running
Ftp Server is running
Database server is running
Matlab Server is running
Tftp Server is running
NMS Server is stopped.
Matlab Server Instance 2 is running
CNS Gateway with port 11011 is down
CNS Gateway SSL with port 11012 is down
CNS Gateway with port 11013 is down
CNS Gateway SSL with port 11014 is down
Plug and Play Gateway Broker with port 61617 is down
Plug and Play Gateway config, image and resource are down on https
Plug and Play Gateway is stopped.
SAM Daemon is running ...
DA Daemon is running ...

Conditions:
Restoring the PI 2.1 backup

Workaround:
Contact the TAC to schedule a WebEx session to have a workaround implemented.

Further Problem Description:
log4j: Adding appender named [LogFileAppenderAEMSConfiguration] to category [com.cisco.aems.utils].
deviceStatusUpdateHook - Object: com.cisco.ifm.inventoryserviceimpl.DeviceStatusUpdateHook@688fbfea
[ResourceClassLoader@27524a91] warning at Type 'IfmConfigTemplatesRestVirtualDomainFilter' (no debug info available)::0 no match for this type name: com.cisco.ifm.template.importExport [Xlint:invalidAbsoluteTypeName]
[ResourceClassLoader@33d74da4] warning at Type 'IfmConfigTemplatesRestVirtualDomainFilter' (no debug info available)::0 no match for this type name: com.cisco.ifm.template.importExport [Xlint:invalidAbsoluteTypeName]
[ResourceClassLoader@46740ca0] warning at Type 'IfmConfigTemplatesRestVirtualDomainFilter' (no debug info available)::0 no match for this type name: com.cisco.ifm.template.importExport [Xlint:invalidAbsoluteTypeName]
Stopping
[ResourceClassLoader@45c3571c] warning at Type 'IfmConfigTemplatesRestVirtualDomainFilter' (no debug info available)::0 no match for this type name: com.cisco.ifm.template.importExport [Xlint:invalidAbsoluteTypeName]
Application context could not be created. Will now exit
###STARTUP FAILED###
org.springframework.beans.factory.access.BootstrapException: Unable to return specified BeanFactory instance: factory key [applicationContext-main], from group with resource name [classpath*:beanRefContext.xml]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'applicationContext-main' defined in URL [file:/opt/CSCOlumos/conf/beanRefContext.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.BeanInstantiationException: Could not instantiate bean class [org.springframework.context.support.ClassPathXmlApplicationContext]: Constructor threw exception; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'deploymentHandler': Invocation of init method failed; nested exception is java.lang.NullPointerException
Related cause: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mapService.createInstance' defined in class path resource [META-INF/spring/rfm-application-context.xml]: Invocation of init method failed; nested exception is java.lang.NullPointerException

sobhardw · ‎06-10-2015

Hello Neil,

Following are the possible issues that can occur in the high-availability environment:

    The primary or secondary Prime Infrastructure goes down during the high-availability registration process.
    The primary or secondary Prime Infrastructure goes down during the failback process.
    The secondary Prime Infrastructure goes down during the failover process.

The possible causes for the above issues can be that the database or the NMS server has failed to start.

Will the following stopped services be restored to operational states:

Failover should be considered temporary. The failed primary Prime Infrastructure should be restored to normal as soon as possible, and failback is initiated. The longer it takes to restore the failed primary Prime Infrastructure, the longer the other Prime Infrastructure sharing that secondary Prime Infrastructure must run without failover support.

Will the discovered devices’ backup files still be intact?

Yes, It will keep the backup files intact.

1. Make sure that you have a backup before starting the high-availability registration or initiating the failback process.

2. If there is any issue with starting the database or the process, complete the following in the primary Prime Infrastructure:

a. Run the following command to re-create a new database:
/opt/CSCOlumos/bin/dbmigrate.sh recreateDB

or run the following command in admin console to re-create a new database:

ncs run reset db

b. Run the following command to remove the existing database:
rm /opt/CSCOlumos/.dbCreated

c. Stop all the processes.

d. Start all the processes.

e. Restore the backup and continue with the high-availability registration.

For more information Please go through the below link :

http://www.cisco.com/c/en/us/td/docs/net_mgmt/prime/infrastructure/2-0/administrator/guide/PIAdminBook/config_aps.html