cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1946
Views
22
Helpful
14
Replies

APIC-EM does not restart after restore

Nathan Sowatskey
Cisco Employee
Cisco Employee

Hi

I experimented with the backup and restore functions of APIC-EM today.

I created a backup, which seemed to be fine, and then I attempted to restore it.

APIC-EM seemed to get as far as the "Starting Controller (3/4)" stage, and then no further for over 30 minutes.

I then restarted the APIC-EM VM, and APIC-EM eventually, kind of, came back such that I could log in. But when I tried to view device inventory, I was presented with a message that the inventory service had not started. That lasted for over 30 min also, then I tried a restore again, and I am looking at the same "3/4" stage message, and it has been there ever since for the last four hours.

I suspect that this is broken.

I shall reinstall APIC-EM, and keep the old VM in case anyone wants to try and debug this.

Regards

Nathan

14 Replies 14

rnaggi
Cisco Employee
Cisco Employee

Nathan ,

Could you let me know the APIC-EM version running on Day-0 and which upgrade version you were trying ? I will try to find out the answer based on your issue.

Hi Rohan

I am using 1.2GA. I am not trying to upgrade APIC-EM.

Regards

Nathan

Will it be possible for you to share the RCA logs ?

1. SSH to the APIC-EM cluster

2. su root

3. RCA

this will create a tar.gz file.

We can look at the log file and see why this is happening

rca_command.tiff

Rohan

The attached image shows my CLI session where I have attempted to follow your instructions. It did not work.

What do you suggest please?

Many thanks

Nathan

Hi Nathan,

For a problem i had myself i was also ask to run the RCA command.

This is what i did to get the file:

  1. You need to login in to the server CLI with the user grapevine and the password you gave during the installation.
  2. Then you see a prompt  ($ in my case)
  3. then you type: rca (lowercase)
  4. Again you need to enter the password of the user grapevine
  5. A tar file is created with the name "grapevine-root-{xxx.xxx.xxx.xxx}-rca-2016-03-16_20-39-16_UTC+0000'"
  6. The file is stored in the "root/tmp" directory, you can use WinSCP (grapevine user) to grap the file to local disk.


I hope is also works for you.

Please let me know if its works.


Greetings

Palermo

Thank you, a variation of this worked

Rohan

Please see the email I have sent to you from my nathan@nathan.to account.

Many thanks

Nathan

Rohan

Over the weekend I ran this test again. I had restored the APIC-EM instance to the default snapshot, which was taken after initial install. Note that the initial install was based on a database restore using the following instructions:

"How to load the pre-populated DB on the APIC-EM 1.2 code:

1. Download both the files from the following box folder and copy it to the sandbox APIC-EM /tmp directory.

link sent separately in email

2. Cd to the /tmp directory - "cd /tmp"

3. Untar the backup.tar file – "sudo tar –zxvf backup.tar" (you should see the /tmp/backup created folder on your APIC-EM)

4. Run the following command to load this DB: "sudo java -jar /tmp/dbrecovery-4.0.0-SNAPSHOT.jar restore /tmp/backup"

Before you do this, be sure to run reset_grapevine on this APIC-EM to ensure that the DB is clean."

After reverting to the snapshot, I did the following:

- Created a discovery job.

- Ran the discovery and discovered a CSR1Kv running in my test lab.

- Backed up the system

- Restored from the latest backup

The restore got as far as stage 3/4 and has been there for ~24 hours.

So, this seems to be replicable and consistent.

Regards

Nathan

Nathan ,

I have forwarded the RCA for further debugging to dev team. I have also sent out an separate email to dev team and copied you in the email.

As a next step , we will wait for the answer.

Could you also try ( if time permits ) , to create a new VM with APIC-EM cluster and do a fresh install and then try backup/restore. We (engineering team) have tried the backup/restore seems to be working for us with the 1.2.x build.

Hope all this helps you to move forward

rnaggi
Cisco Employee
Cisco Employee

Nathan ,

Here is the response from Dev team. NTP was the issue

There is a NTP connectivity error

Error occurred while running restore operation

[task_id=c444b23a-4a7d-11e6-be89-0a01c0a8010d]: Unable to configure NTP.

Please confirm NTP server connectectivity and settings.

You can check /var/log/grapevine_manager_activity.log and /var/log/grapevine_manager.log for restore logs.

Once NTP was fixed in the lab, the restore worked.

Many thanks

Nathan

Hi Nathan,

Did you do anything (on APIC-EM) after configuring NTP server on your network?

Thanks!

Hi

After configuring NTP, all was fine.

Many thanks

Nathan

Thanks again Nathan,

In my case was necessary to start again config_wizard, reboot and wait more than 1h  to get APIC-EM  recovered.

This is my log (sorry if it is too long)

Note:  apic-em-jboss-ejbca takes 20min to get up.

2016-10-18 08:53:04,221 |    Running [4/37]: grapevine grapevine-coordinator-service rabbitmq cassandra

2016-10-18 08:53:19,338 |    Running [5/37]: router

2016-10-18 08:54:24,890 |    Running [6/37]: reverse-proxy

2016-10-18 08:54:45,068 |    Running [8/37]: postgres election-service

2016-10-18 08:54:50,114 |    Running [9/37]: log-aggregator

2016-10-18 08:55:15,356 |    Running [10/37]: cas-service

2016-10-18 08:56:10,829 |    Running [11/37]: node-ui

2016-10-18 08:56:56,301 |    Running [12/37]: rbac-service

2016-10-18 08:58:17,093 |    Running [13/37]: ipgeo-service

2016-10-18 09:01:59,413 |    Running [14/37]: remote-ras

2016-10-18 09:06:06,965 |    Running [15/37]: topology-service

2016-10-18 09:06:17,057 |    Running [16/37]: scheduler-service

2016-10-18 09:08:23,342 |    Running [17/37]: policy-manager-service

2016-10-18 09:08:28,413 |    Running [18/37]: apic-em-pki-broker-service

2016-10-18 09:09:54,862 |    Running [19/37]: task-service

2016-10-18 09:10:12,619 |    Running [21/37]: pnp-service access-policy-programmer-service

2016-10-18 09:10:25,549 |    Running [22/37]: telemetry-service

2016-10-18 09:10:44,655 |    Running [23/37]: qos-policy-programmer-service

2016-10-18 09:11:26,234 |    Running [24/37]: identity-manager-pxgrid-service file-service

2016-10-18 09:11:47,746 |    Running [25/37]: nbar-policy-programmer-service

2016-10-18 09:12:01,325 |    Running [26/37]: node-ui

2016-10-18 09:12:26,052 |    Running [27/37]: visibility-service

2016-10-18 09:13:55,137 |    Running [28/37]: app-vis-policy-programmer-service

2016-10-18 09:15:11,462 |    Running [29/37]: pfr-policy-programmer-service

2016-10-18 09:15:34,205 |    Running [30/37]: ip-pool-manager-service

2016-10-18 09:15:45,073 |    Running [31/37]: policy-analysis-service

2016-10-18 09:16:53,754 |    Running [32/37]: qos-lan-policy-programmer-service

2016-10-18 09:21:37,240 |    Running [33/37]: apic-em-event-service

2016-10-18 09:22:48,321 |    Running [34/37]: apic-em-jboss-ejbca

2016-10-18 09:42:41,412 |    Running [35/37]: apic-em-network-programmer-service

2016-10-18 09:45:02,824 |    Running [36/37]: network-poller-service

2016-10-18 09:47:09,147 |    Running [37/37]: apic-em-inventory-manager-service

2016-10-18 09:47:09,207 |    Service re-balancing not required

2016-10-18 09:47:09,207 | Configuring Update Service...

2016-10-18 09:47:12,044 | CONFIGURATION COMPLETED

The configuration wizard has completed successfully!

Regards,

Hilario.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:


This community is intended for developer topics around Data Center technology and products. If you are looking for a non-developer topic about Data Center, you might find additional information in the Data Center and Cloud community