cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2918
Views
22
Helpful
14
Replies

APIC-EM does not restart after restore

Nathan Sowatskey
Cisco Employee
Cisco Employee

Hi

I experimented with the backup and restore functions of APIC-EM today.

I created a backup, which seemed to be fine, and then I attempted to restore it.

APIC-EM seemed to get as far as the "Starting Controller (3/4)" stage, and then no further for over 30 minutes.

I then restarted the APIC-EM VM, and APIC-EM eventually, kind of, came back such that I could log in. But when I tried to view device inventory, I was presented with a message that the inventory service had not started. That lasted for over 30 min also, then I tried a restore again, and I am looking at the same "3/4" stage message, and it has been there ever since for the last four hours.

I suspect that this is broken.

I shall reinstall APIC-EM, and keep the old VM in case anyone wants to try and debug this.

Regards

Nathan

14 Replies 14

rnaggi
Cisco Employee
Cisco Employee

Nathan ,

Could you let me know the APIC-EM version running on Day-0 and which upgrade version you were trying ? I will try to find out the answer based on your issue.

Hi Rohan

I am using 1.2GA. I am not trying to upgrade APIC-EM.

Regards

Nathan

Will it be possible for you to share the RCA logs ?

1. SSH to the APIC-EM cluster

2. su root

3. RCA

this will create a tar.gz file.

We can look at the log file and see why this is happening

rca_command.tiff

Rohan

The attached image shows my CLI session where I have attempted to follow your instructions. It did not work.

What do you suggest please?

Many thanks

Nathan

Hi Nathan,

For a problem i had myself i was also ask to run the RCA command.

This is what i did to get the file:

  1. You need to login in to the server CLI with the user grapevine and the password you gave during the installation.
  2. Then you see a prompt  ($ in my case)
  3. then you type: rca (lowercase)
  4. Again you need to enter the password of the user grapevine
  5. A tar file is created with the name "grapevine-root-{xxx.xxx.xxx.xxx}-rca-2016-03-16_20-39-16_UTC+0000'"
  6. The file is stored in the "root/tmp" directory, you can use WinSCP (grapevine user) to grap the file to local disk.


I hope is also works for you.

Please let me know if its works.


Greetings

Palermo

Thank you, a variation of this worked

Rohan

Please see the email I have sent to you from my nathan@nathan.to account.

Many thanks

Nathan

Rohan

Over the weekend I ran this test again. I had restored the APIC-EM instance to the default snapshot, which was taken after initial install. Note that the initial install was based on a database restore using the following instructions:

"How to load the pre-populated DB on the APIC-EM 1.2 code:

1. Download both the files from the following box folder and copy it to the sandbox APIC-EM /tmp directory.

link sent separately in email

2. Cd to the /tmp directory - "cd /tmp"

3. Untar the backup.tar file – "sudo tar –zxvf backup.tar" (you should see the /tmp/backup created folder on your APIC-EM)

4. Run the following command to load this DB: "sudo java -jar /tmp/dbrecovery-4.0.0-SNAPSHOT.jar restore /tmp/backup"

Before you do this, be sure to run reset_grapevine on this APIC-EM to ensure that the DB is clean."

After reverting to the snapshot, I did the following:

- Created a discovery job.

- Ran the discovery and discovered a CSR1Kv running in my test lab.

- Backed up the system

- Restored from the latest backup

The restore got as far as stage 3/4 and has been there for ~24 hours.

So, this seems to be replicable and consistent.

Regards

Nathan

Nathan ,

I have forwarded the RCA for further debugging to dev team. I have also sent out an separate email to dev team and copied you in the email.

As a next step , we will wait for the answer.

Could you also try ( if time permits ) , to create a new VM with APIC-EM cluster and do a fresh install and then try backup/restore. We (engineering team) have tried the backup/restore seems to be working for us with the 1.2.x build.

Hope all this helps you to move forward

rnaggi
Cisco Employee
Cisco Employee

Nathan ,

Here is the response from Dev team. NTP was the issue

There is a NTP connectivity error

Error occurred while running restore operation

[task_id=c444b23a-4a7d-11e6-be89-0a01c0a8010d]: Unable to configure NTP.

Please confirm NTP server connectectivity and settings.

You can check /var/log/grapevine_manager_activity.log and /var/log/grapevine_manager.log for restore logs.

Once NTP was fixed in the lab, the restore worked.

Many thanks

Nathan

Hi Nathan,

Did you do anything (on APIC-EM) after configuring NTP server on your network?

Thanks!

Hi

After configuring NTP, all was fine.

Many thanks

Nathan

Thanks again Nathan,

In my case was necessary to start again config_wizard, reboot and wait more than 1h  to get APIC-EM  recovered.

This is my log (sorry if it is too long)

Note:  apic-em-jboss-ejbca takes 20min to get up.

2016-10-18 08:53:04,221 |    Running [4/37]: grapevine grapevine-coordinator-service rabbitmq cassandra

2016-10-18 08:53:19,338 |    Running [5/37]: router

2016-10-18 08:54:24,890 |    Running [6/37]: reverse-proxy

2016-10-18 08:54:45,068 |    Running [8/37]: postgres election-service

2016-10-18 08:54:50,114 |    Running [9/37]: log-aggregator

2016-10-18 08:55:15,356 |    Running [10/37]: cas-service

2016-10-18 08:56:10,829 |    Running [11/37]: node-ui

2016-10-18 08:56:56,301 |    Running [12/37]: rbac-service

2016-10-18 08:58:17,093 |    Running [13/37]: ipgeo-service

2016-10-18 09:01:59,413 |    Running [14/37]: remote-ras

2016-10-18 09:06:06,965 |    Running [15/37]: topology-service

2016-10-18 09:06:17,057 |    Running [16/37]: scheduler-service

2016-10-18 09:08:23,342 |    Running [17/37]: policy-manager-service

2016-10-18 09:08:28,413 |    Running [18/37]: apic-em-pki-broker-service

2016-10-18 09:09:54,862 |    Running [19/37]: task-service

2016-10-18 09:10:12,619 |    Running [21/37]: pnp-service access-policy-programmer-service

2016-10-18 09:10:25,549 |    Running [22/37]: telemetry-service

2016-10-18 09:10:44,655 |    Running [23/37]: qos-policy-programmer-service

2016-10-18 09:11:26,234 |    Running [24/37]: identity-manager-pxgrid-service file-service

2016-10-18 09:11:47,746 |    Running [25/37]: nbar-policy-programmer-service

2016-10-18 09:12:01,325 |    Running [26/37]: node-ui

2016-10-18 09:12:26,052 |    Running [27/37]: visibility-service

2016-10-18 09:13:55,137 |    Running [28/37]: app-vis-policy-programmer-service

2016-10-18 09:15:11,462 |    Running [29/37]: pfr-policy-programmer-service

2016-10-18 09:15:34,205 |    Running [30/37]: ip-pool-manager-service

2016-10-18 09:15:45,073 |    Running [31/37]: policy-analysis-service

2016-10-18 09:16:53,754 |    Running [32/37]: qos-lan-policy-programmer-service

2016-10-18 09:21:37,240 |    Running [33/37]: apic-em-event-service

2016-10-18 09:22:48,321 |    Running [34/37]: apic-em-jboss-ejbca

2016-10-18 09:42:41,412 |    Running [35/37]: apic-em-network-programmer-service

2016-10-18 09:45:02,824 |    Running [36/37]: network-poller-service

2016-10-18 09:47:09,147 |    Running [37/37]: apic-em-inventory-manager-service

2016-10-18 09:47:09,207 |    Service re-balancing not required

2016-10-18 09:47:09,207 | Configuring Update Service...

2016-10-18 09:47:12,044 | CONFIGURATION COMPLETED

The configuration wizard has completed successfully!

Regards,

Hilario.