09-21-2017 04:05 AM - edited 03-01-2019 04:41 AM
Hi there,
I attempted upgrading to v1.5.1.1054 from v1.5.0.1368 early this morning, it initially failed due to NTP sync issues. I resolved these and attempted the upgrade again. After waiting over two hours for it to complete, I refreshed the page and eventually it 404'd .
grapevine doesn't have any recollection of the second failed upgrade:
$ grape update history
UPDATE PROPERTY VALUE
----------------------------------------------------------------------
63c88792-9eaa-11e7-887d-005056bc0ab3 finished Thu Sep 21, 2017 08:55:12 AM (2 hrs ago)
63c88792-9eaa-11e7-887d-005056bc0ab3 reason Unable to sync with any of the configured NTP servers.
63c88792-9eaa-11e7-887d-005056bc0ab3 status failed
63c88792-9eaa-11e7-887d-005056bc0ab3 task_id 63c88792-9eaa-11e7-887d-005056bc0ab3
63c88792-9eaa-11e7-887d-005056bc0ab3 update_from 1.5.0.1368
63c88792-9eaa-11e7-887d-005056bc0ab3 update_to
63c88792-9eaa-11e7-887d-005056bc0ab3 username admin
#
I restarted the service hoping that might bring it back to life, but now it looks even worse. I can get to the webGUI login page, but no further, the instance status looks like this:
$ grape instance status
SERVICE VERSION STATE CLIENT IP UPTIME
-----------------------------------------------------------------------------------------------------------------------------
APIC-EM-CAA-SERVICE 5.1.31 harvested None None
access-policy-programmer-service 5.1.31.1368 harvested None None
apic-em-event-service 5.1.31.1368 harvested None None
apic-em-inventory-manager-service 5.1.31.1368 harvested None None
apic-em-jboss-ejbca 5.1.31.1368 harvested None None
apic-em-network-programmer-service 5.1.31.1368 harvested None None
apic-em-pki-broker-service 1.5.0.1368 harvested None None
cas-service 5.0.31.4021 harvested None None
cassandra 1.0.0 running 8ad77cb0-530a-43f4-b952-3e2bf2a2eb1c 169.254.0.1 86 days, 0:12:56
election-service 5.0.31.4021 harvested None None
file-service 5.1.31.1368 harvested None None
grapevine 1.0.0 running 8ad77cb0-530a-43f4-b952-3e2bf2a2eb1c 169.254.0.1 86 days, 0:12:52
grapevine-coordinator-service 1.0.0 running 8ad77cb0-530a-43f4-b952-3e2bf2a2eb1c 169.254.0.1 86 days, 0:12:45
grapevine-log-collector 1.0.0 running 8ad77cb0-530a-43f4-b952-3e2bf2a2eb1c 169.254.0.1 86 days, 0:12:41
grouping-service 5.1.31.1368 harvested None None
identity-manager-pxgrid-service 5.1.31.1368 harvested None None
nbar-policy-programmer-service 5.1.31.1368 harvested None None
network-poller-service 5.1.31.1368 harvested None None
node-ui 1.0.0 running 8ad77cb0-530a-43f4-b952-3e2bf2a2eb1c 169.254.0.1 86 days, 0:12:49
pnp-service 5.17.32.35 harvested None None
policy-analysis-service 5.1.31.1368 harvested None None
policy-manager-service 5.1.31.1368 harvested None None
postgres 5.1.31.1368 harvested None None
qos-lan-policy-programmer-service 5.1.31.1368 harvested None None
qos-monitoring-service 5.1.31.1368 harvested None None
qos-policy-programmer-service 5.1.31.1368 harvested None None
rabbitmq 1.0.0 running 8ad77cb0-530a-43f4-b952-3e2bf2a2eb1c 169.254.0.1 86 days, 0:12:59
rbac-service 5.0.31.4021 harvested None None
reverse-proxy 5.0.31.4021 running f27f3a98-a069-4eda-a640-79e0a3c29800 169.254.0.1 0:20:03
router 5.0.31.4021 running f27f3a98-a069-4eda-a640-79e0a3c29800 169.254.0.1 0:20:04
scheduler-service 5.1.31.1368 harvested None None
task-service 5.1.31.1368 harvested None None
telemetry-service 5.1.31.1368 harvested None None
topology-service 1.5.0.1368 harvested None None
(grapevine)
[Thu Sep 21 10:56:08 UTC] grapevine@10.209.120.40 (grapevine-root-1) ~
Apart from the services listed as 'running', all the others cycle between 'deploying' and 'harvested'.
Any ideas?
cheers,
Seb.
09-21-2017 04:38 AM
did you do a "reset_grapevine" (Saying "N" to all the questions)?
That should get you back to a sane state.
09-21-2017 06:32 AM
Hi Adam,
I've tried that but get:
2017-09-21 13:02:41,955 | Attempting to sync with time server pool.ntp.org...
2017-09-21 13:02:50,786 | Unable to sync with time server pool.ntp.org
2017-09-21 13:02:55,791 | Configuring NTP (attempt #2)...
2017-09-21 13:02:55,792 | Attempting to sync with time server pool.ntp.org...
2017-09-21 13:03:04,623 | Unable to sync with time server pool.ntp.org
2017-09-21 13:03:09,629 | Configuring NTP (attempt #3)...
2017-09-21 13:03:09,630 | Attempting to sync with time server pool.ntp.org...
2017-09-21 13:03:18,462 | Unable to sync with time server pool.ntp.org
2017-09-21 13:03:23,468 | Configuring NTP (attempt #4)...
2017-09-21 13:03:23,468 | Attempting to sync with time server pool.ntp.org...
2017-09-21 13:03:32,302 | Unable to sync with time server pool.ntp.org
2017-09-21 13:03:37,308 | Configuring NTP (attempt #5)...
2017-09-21 13:03:37,310 | Attempting to sync with time server pool.ntp.org...
2017-09-21 13:03:46,092 | Unable to sync with time server pool.ntp.org
2017-09-21 13:03:51,098 | Unable to configure NTP after 5 attempts
2017-09-21 13:03:51,266 | [configure_ntp:2872] Unable to configure NTP. Please confirm NTP server connectivity and settings.
2017-09-21 13:03:51,266 | Config wizard completed with errors
I've edited /etc/ntp.conf to use a stratum 1 source I know which works:
root@grapevine-root-1:/home/grapevine# ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.xx .GPS. 1 u 24 64 377 83.351 -0.224 6.530
however it looks like the the reset_grapevine script must be hardcoded to use pool.ntp.org . I've even tried editing /etc/hosts so that pool.ntp.org resolves to my NTP source, but that has had no effect.
Looking at /opt/cisco/grapevine/bin/grapevine_factory_reset shows that it is trying to run the config wizard for 1.5.1.4018 :
load_entry_point('grapevine-config-wizard==1.5.1.4018.dev1083-gf84d517', 'console_scripts', 'grapevine_factory_reset')()
...surely if the upgrade didn't successfully complete this should be trying to run a v1.5.0.1368 reset script?
cheers,
Seb.
09-21-2017 02:33 PM
I think you must have a "bad" ntp server in your grapevine config file.
What is the ntp setting in /etc/grapevine/controller-config.json
that is the setting which will be used on a reset_grapevine
09-21-2017 05:37 PM
and I forgot to mention if you need to change your ntp setting in controller_config.json you can do it through the config_wizard.
Again at the end of the config_wizard, make sure you say "N" to destroy disks to maintain state
09-22-2017 07:21 AM
I tried the config_wizard method and it looks to fail on the same step:
The configuration wizard has encountered the following error:
Timeout of 3600 seconds has been exceeded while growing services. The following services are not yet in RUNNING state: scheduler-
service, identity-manager-pxgrid-service, policy-manager-service, task-service, topology-service, policy-analysis-service, telemetry-
service
Use the "back" button to revisit previous wizard screens to correct any errors...
09-22-2017 02:14 AM
It was going so well. I manually edited controller_config.json the two options in config_wizard didn't look applicable :/
After editing the file reset_grapevine was progressing nicely until:
2017-09-22 08:04:46,794 | | Running [26/34]: apic-em-inventory-manager-service apic-em-network-programmer-service |
2017-09-22 08:05:01,932 | | Running [27/34]: network-poller-service |
2017-09-22 08:57:32,234 | [grow_all_services:1161] Timeout of 3600 seconds has been exceeded while growing services. The following services are not yet in RUNNING state: scheduler-service, identity-manager-pxgrid-service, policy-manager-service, task-service, topology-service, policy-analysis-service, telemetry-service
2017-09-22 08:57:32,235 | Config wizard completed with errors
(grapevine)
[Fri Sep 22 08:57:32 UTC] grapevine@10.209.120.40 (grapevine-root-1) ~
Is there a way to get the reset_grapevine to reset to the previous version and not the new v1.5.1.1054 version which didn't install correctly?
cheers,
Seb.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide