I experimented with the backup and restore functions of APIC-EM today.
I created a backup, which seemed to be fine, and then I attempted to restore it.
APIC-EM seemed to get as far as the "Starting Controller (3/4)" stage, and then no further for over 30 minutes.
I then restarted the APIC-EM VM, and APIC-EM eventually, kind of, came back such that I could log in. But when I tried to view device inventory, I was presented with a message that the inventory service had not started. That lasted for over 30 min also, then I tried a restore again, and I am looking at the same "3/4" stage message, and it has been there ever since for the last four hours.
I suspect that this is broken.
I shall reinstall APIC-EM, and keep the old VM in case anyone wants to try and debug this.
Could you let me know the APIC-EM version running on Day-0 and which upgrade version you were trying ? I will try to find out the answer based on your issue.
Will it be possible for you to share the RCA logs ?
1. SSH to the APIC-EM cluster
2. su root
this will create a tar.gz file.
We can look at the log file and see why this is happening
The attached image shows my CLI session where I have attempted to follow your instructions. It did not work.
What do you suggest please?
For a problem i had myself i was also ask to run the RCA command.
This is what i did to get the file:
I hope is also works for you.
Please let me know if its works.
Over the weekend I ran this test again. I had restored the APIC-EM instance to the default snapshot, which was taken after initial install. Note that the initial install was based on a database restore using the following instructions:
"How to load the pre-populated DB on the APIC-EM 1.2 code:
1. Download both the files from the following box folder and copy it to the sandbox APIC-EM /tmp directory.
link sent separately in email
2. Cd to the /tmp directory - "cd /tmp"
3. Untar the backup.tar file – "sudo tar –zxvf backup.tar" (you should see the /tmp/backup created folder on your APIC-EM)
4. Run the following command to load this DB: "sudo java -jar /tmp/dbrecovery-4.0.0-SNAPSHOT.jar restore /tmp/backup"
Before you do this, be sure to run reset_grapevine on this APIC-EM to ensure that the DB is clean."
After reverting to the snapshot, I did the following:
- Created a discovery job.
- Ran the discovery and discovered a CSR1Kv running in my test lab.
- Backed up the system
- Restored from the latest backup
The restore got as far as stage 3/4 and has been there for ~24 hours.
So, this seems to be replicable and consistent.
I have forwarded the RCA for further debugging to dev team. I have also sent out an separate email to dev team and copied you in the email.
As a next step , we will wait for the answer.
Could you also try ( if time permits ) , to create a new VM with APIC-EM cluster and do a fresh install and then try backup/restore. We (engineering team) have tried the backup/restore seems to be working for us with the 1.2.x build.
Hope all this helps you to move forward
Here is the response from Dev team. NTP was the issue
There is a NTP connectivity error
Error occurred while running restore operation
[task_id=c444b23a-4a7d-11e6-be89-0a01c0a8010d]: Unable to configure NTP.
Please confirm NTP server connectectivity and settings.
You can check /var/log/grapevine_manager_activity.log and /var/log/grapevine_manager.log for restore logs.
Thanks again Nathan,
In my case was necessary to start again config_wizard, reboot and wait more than 1h to get APIC-EM recovered.
This is my log (sorry if it is too long)
Note: apic-em-jboss-ejbca takes 20min to get up.
2016-10-18 08:53:04,221 | Running [4/37]: grapevine grapevine-coordinator-service rabbitmq cassandra
2016-10-18 08:53:19,338 | Running [5/37]: router
2016-10-18 08:54:24,890 | Running [6/37]: reverse-proxy
2016-10-18 08:54:45,068 | Running [8/37]: postgres election-service
2016-10-18 08:54:50,114 | Running [9/37]: log-aggregator
2016-10-18 08:55:15,356 | Running [10/37]: cas-service
2016-10-18 08:56:10,829 | Running [11/37]: node-ui
2016-10-18 08:56:56,301 | Running [12/37]: rbac-service
2016-10-18 08:58:17,093 | Running [13/37]: ipgeo-service
2016-10-18 09:01:59,413 | Running [14/37]: remote-ras
2016-10-18 09:06:06,965 | Running [15/37]: topology-service
2016-10-18 09:06:17,057 | Running [16/37]: scheduler-service
2016-10-18 09:08:23,342 | Running [17/37]: policy-manager-service
2016-10-18 09:08:28,413 | Running [18/37]: apic-em-pki-broker-service
2016-10-18 09:09:54,862 | Running [19/37]: task-service
2016-10-18 09:10:12,619 | Running [21/37]: pnp-service access-policy-programmer-service
2016-10-18 09:10:25,549 | Running [22/37]: telemetry-service
2016-10-18 09:10:44,655 | Running [23/37]: qos-policy-programmer-service
2016-10-18 09:11:26,234 | Running [24/37]: identity-manager-pxgrid-service file-service
2016-10-18 09:11:47,746 | Running [25/37]: nbar-policy-programmer-service
2016-10-18 09:12:01,325 | Running [26/37]: node-ui
2016-10-18 09:12:26,052 | Running [27/37]: visibility-service
2016-10-18 09:13:55,137 | Running [28/37]: app-vis-policy-programmer-service
2016-10-18 09:15:11,462 | Running [29/37]: pfr-policy-programmer-service
2016-10-18 09:15:34,205 | Running [30/37]: ip-pool-manager-service
2016-10-18 09:15:45,073 | Running [31/37]: policy-analysis-service
2016-10-18 09:16:53,754 | Running [32/37]: qos-lan-policy-programmer-service
2016-10-18 09:21:37,240 | Running [33/37]: apic-em-event-service
2016-10-18 09:22:48,321 | Running [34/37]: apic-em-jboss-ejbca
2016-10-18 09:42:41,412 | Running [35/37]: apic-em-network-programmer-service
2016-10-18 09:45:02,824 | Running [36/37]: network-poller-service
2016-10-18 09:47:09,147 | Running [37/37]: apic-em-inventory-manager-service
2016-10-18 09:47:09,207 | Service re-balancing not required
2016-10-18 09:47:09,207 | Configuring Update Service...
2016-10-18 09:47:12,044 | CONFIGURATION COMPLETED
The configuration wizard has completed successfully!