06-28-2017 12:10 AM - edited 03-17-2019 10:39 AM
Hello
Has anyone come across a similar issue and if so what was the fix?
Currently I have a cluster (version 10.5.2) thats shows the runtimestate on all subcribers to be syncing...
When I look at the CU reports i see all the subs as initializing. However normal functionality is fine and backups run without issues also. When I run a status command on dbreplication i can see everything connected.
I have gone through troubleshooting dbreplication with repair etc as well as restarting all servers also. I am unsure as to where to go next in troubleshooting other than upgrading the cluster. Seems very strange as it is having no impact to daily use.
The only anomoly i see is that NTP is at stratum 5 and that does give me a message to say that it is not recommended.
Thank you all
FF
06-28-2017 12:50 AM
Hi FF,
you would have to fix NTP first try keeping the stratum below 3. Can you attach below.
Utils diagnose test
Show ntp status.
JB
07-13-2017 04:20 AM
Hi JB
We have had a network outage since we last spoke and interestingly I am since seeing a change for my local cluster:
PING DB/RPC/ REPL. Replication REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) DbMon? QUEUE Group ID (RTMT) & Details
----------- ---------- ------ ------- ----- ----------- ------------------
PUB01 10.PPP.1P3.PPP 0.015 Y/Y/Y 0 (g_2) (2) Setup Completed
1SUB02 10.PPP.1P3.PPP 0.182 Y/Y/Y 0 (g_3) (2) Setup Completed
1TFTP03 10.PPP.1P3.PPP 0.140 Y/Y/Y 0 (g_4) (2) Setup Completed
2SUB01 10.MMM.1M3.MMM 43.499 Y/Y/Y -- (-) (0) Syncing...
2SUB02 10.MMM.1M3.MMM 45.028 Y/Y/Y 0 (g_6) (0) Syncing...
2TFTP03 10.MMM.1M3.MMM 43.610 Y/Y/Y 0 (g_7) (0) Syncing...
-------
The last three members seem to hang on syncing only now. These three are remote to where I am.
When I run a status on the replication I can see one member missing:
SERVER ID STATE STATUS QUEUE CONNECTION CHANGED
-----------------------------------------------------------------------
g_2_ccm10_5_2_13900_12 2 Active Local 0
g_3_ccm10_5_2_13900_12 3 Active Connected 0 Jun 28 15:31:21
g_4_ccm10_5_2_13900_12 4 Active Connected 0 Jun 28 15:31:24
g_6_ccm10_5_2_13900_12 6 Active Connected 0 Jul 12 07:37:45
g_7_ccm10_5_2_13900_12 7 Active Connected 0 Jul 13 20:34:26
--------
Diagnostics test:
admin:utils diagnose test
Log file: platform/log/diag3.log
Starting diagnostic test(s)
===========================
test - disk_space : Passed (available: 7099 MB, used: 12529 MB)
skip - disk_files : This module must be run directly and off hours
test - service_manager : Passed
test - tomcat : Passed
test - tomcat_deadlocks : Passed
test - tomcat_keystore : Passed
test - tomcat_connectors : Passed
test - tomcat_threads : Passed
test - tomcat_memory : Passed
test - tomcat_sessions : Passed
skip - tomcat_heapdump : This module must be run directly and off hours
test - validate_network : Passed
test - raid : Passed
test - system_info : Passed (Collected system information in diagnostic log)
test - ntp_reachability : Warning
The host 10.2M2.MMM.6 is not reachable, or it's NTP service is down.
The host 10.2P0.1P1.101 is not reachable, or it's NTP service is down.
Some of the configured external NTP servers are not reachable.
It is recommended that for better time synchronization all of
the NTP servers be reachable.
Please use the OS Admin GUI to add/remove NTP servers.
test - ntp_clock_drift : Passed
test - ntp_stratum : Failed
The reference NTP server is a stratum 5 clock.
NTP servers with stratum 5 or worse clocks are deemed unreliable.
Please consider using an NTP server with better stratum level.
Please use OS Admin GUI to add/delete NTP servers.
skip - sdl_fragmentation : This module must be run directly and off hours
skip - sdi_fragmentation : This module must be run directly and off hours
Diagnostics Completed
--------
NTP Details:
ntpd (pid 27758) is running...
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.2X2.XXX.1 10.2A0.AAA.8 6 u 281 1024 377 42.281 -2.090 1.852
*10.1X0.XXX.4 130.88.200.6 4 u 283 1024 377 220.381 0.128 3.925
+10.2X0.XX.1 10.1A0.AAA.4 5 u 1015 1024 377 1.290 0.466 0.853
10.2M2.MMM.6 .XFAC. 16 u - 1024 0 0.000 0.000 0.000
10.2P2.1P1.101 .XFAC. 16 u - 1024 0 0.000 0.000 0.000
Thanks again
FF
07-13-2017 04:26 AM
Hi,
You can clearly see issue with your NTP
test - ntp_reachability : Warning
The host 10.2M2.MMM.6 is not reachable, or it's NTP service is down.
The host 10.2P0.1P1.101 is not reachable, or it's NTP service is down.
Some of the configured external NTP servers are not reachable.
It is recommended that for better time synchronization all of
the NTP servers be reachable.
Please use the OS Admin GUI to add/remove NTP servers.
test - ntp_clock_drift : Passed
test - ntp_stratum : Failed
Cisco recommend stratum should stay below 4, fix the NTP first and then issue "utils dbreplication reset all" on publisher.
(Rate if it helps)
JB
07-17-2017 11:11 PM
I have fixed the issue at last.
Looks like that stratum level 5 does not cause an impact as they are still on the same level.
What I have done since is remove legay servers mentioned above, but I have also rebooted servers in the cluster. Whether it was one of these are a combination I'm not sure.
Thanks for the advice.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide