TAC says my NTP setup might be bad, possibly causing our AD issues

ben.posner · ‎08-27-2018

we have an 8 node cluster attached to 6 different Active Directories. randomly a node (or two) will loose connection to one or more ADs and users will fail authentications when using that policy node. the other nodes are fine. we've been trying to track this down for about a month now.

in investigating logs and such, our recent TAC engineer noted that not all of the nodes are using the same NTP source. each server is configured for our two internal NTP servers. each of the active directories are also attached to the same NTP servers. both NTP servers connect back to the same external source for their time. TAC insists that all nodes should talk to the same NTP server. my response is that they are and using a single NTP server is a terrible idea. what say you all?

example 'show ntp' on an admin node and on an example policy node:

sopsiseadmin/gcore# sh ntp
Configured NTP Servers:
  10.128.10.108
  10.128.10.109

synchronised to NTP server (10.128.10.108) at stratum 3
   time correct to within 78 ms
   polling server every 1024 s

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.          10 l  74h   64    0    0.000    0.000   0.000
*10.128.10.108   129.6.15.28      2 u  613 1024  377    0.441    1.378   1.596
+10.128.10.109   129.6.15.28      2 u  981 1024  377    0.385   -3.839   1.734

* Current time source, + Candidate , x False ticker

Warning: Output results may conflict during periods of changing synchronization.

hzniseplc1/gcore# sh ntp
Configured NTP Servers: 
  10.128.10.108
  10.128.10.109

synchronised to NTP server (10.128.10.109) at stratum 3 
   time correct to within 68 ms
   polling server every 1024 s

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.          10 l  87h   64    0    0.000    0.000   0.000
+10.128.10.108   129.6.15.28      2 u  180 1024  377    0.180    2.257   1.973
*10.128.10.109   129.6.15.28      2 u  726 1024  377    0.165   -2.366   3.167

* Current time source, + Candidate , x False ticker 

Warning: Output results may conflict during periods of changing synchronization.

anthonylofreso · ‎08-27-2018

We have two NTP servers configured. I just checked, and all (6) servers are configured with the same server for Primary, and the same server for Secondary. Thus, all currently sync'd to the same server.

In fact, via the web interface this seems to be the only way you can configure NTP. There is no per-node setting. The only time I recall specifying primary / secondary NTP per-node outside of this, was during the initial install wizard.

Looks like there's no explicit way via the CLI either to specify NTP server priority.

svpisep01/admin(config)# ntp server ?
  <WORD>  NTP server, IP or HOSTNAME (Max Size - 255)

svpisep01/admin(config)# ntp server my.ntp.server ?
  key   Peer key number
  <cr>  Carriage return.

svpisep01/admin(config)# ntp server my.ntp.server key ?
  <1-65535>  

svpisep01/admin(config)# ntp server my.ntp.server key 1 ?
 <cr> Carriage return.

svpisep01/admin(config)#

I would vote set all to same, if for nothing else, proving that this is / isn't the issue.

It can be frustrating when TAC says "oh, we found this thing not configured to best practices. This 'could' be causing your issue" as often times this is stated with little or no evidence that said best practice would prevent the issue. But if it's easy enough to rule out, I typically oblige.

ben.posner · ‎08-27-2018

all of mine are configured for 108 first and 109 second. but different nodes will choose one or the other seemingly at random. and there doesn't seem to be a 'prefer' keyword in ISE, or at least this version (2.3p4).

anthonylofreso · ‎08-28-2018

So, how is TAC recommending you set it up to ensure they're all using the same server then? Seems to me your configuration is probably correct. There's just not a whole lot to configuring NTP.

anthonylofreso · ‎08-28-2018

I went back and looked at some of my install notes. We ran into several issues, one of those being NTP. Here is what we noted (probably none of this is super relevant to your issue. but just sharing additional information):

NTP was not functioning initially. There was an "NTP Service Failure" alarm in the web dashboard. Did some troubleshooting, and eventually removed/re-added the NTP server which fixed the issue:

svpisea01/admin# conf t
Enter configuration commands, one per line.  End with CNTL/Z.
svpisea01/admin(config)# no ntp server
% Warning: Cannot remove NTP Server as at least one is required
svpisea01/admin(config)# ntp server 10.200.1.1
svpisea01/admin(config)# do sh ntp
Configured NTP Servers:
  10.200.1.1

synchronised to NTP server (10.200.1.1) at stratum 4
   time correct to within 10617 ms
   polling server every 64 s

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.          10 l    6   64    1    0.000    0.000   0.000
*10.200.1.1      12.168.68.1      3 u    5   64    1    0.779  2646.24   0.000

* Current time source, + Candidate , x False ticker

Warning: Output results may conflict during periods of changing synchronization.

svpisea01/admin(config)# exit
svpisea01/admin#

Note the message "Cannot remove NTP Server as at least one is required" but it does let me remove it.

FURTHER NOTES ON NTP

When it came time to integrate the first ISE server with Active Directory, I ran the AD Diagnostic Tool tests, and the NTP test failed with the following details:

Test Name          :System health - check NTP
Description        :Checks NTP configuration : available peers (which is not local) and time sync
Instance           :System
Status             :Warning
Start Time         :15:40:13 01.02.2017 EST
End Time           :15:40:33 01.02.2017 EST
Duration           :20 sec
Result and Remedy...
NTP client is not synchronized - could not connect to any peer.
This may impair Kerberos functionality! Please check NTP and network configuration
Below is more low level troubleshooting information, such as listing of unreachable peers.
Note that peers are actual servers and not necessary the same as high level definition in ntp.conf.

I found the following document that described my issue closely, though I'm not using a Windows server as my NTP box: issue. TAC case #: 681718069. Potentially relevant bug. Another potentially relevant bug.

This bug was the resolution attached to my TAC case: CSCux82480

hslai · ‎08-29-2018

Two NTP servers might not be good under some special circumstances, such as ntpd - Both my ntp servers are marked as falsetickers in the status - Server Fault. I do not think such applies to your case.

Anthony's points on NTP are valid but CSCux82480 fixed in ISE 2.0 Patch 4, 2.1 Patch 2, and 2.2 FCS so it might not apply to your deployment.

I agree with you that the ISE nodes in the same deployment need not use the same set of NTP servers as long as the NTP servers used are fairly reliable and synchronizing to good references.

I would suggest to troubleshoot with TAC further. If you provide your case number, I can take a quick look.