Solved: ISE NTP Misbehaviour

Christopher Hobbs · ‎06-13-2016

Over the weekend, we upgraded an ISE 1.3 system from patch 3 to patch 7. After the upgrade we observed client connectivity issues (for static IP devices and decided to roll back. We then noticed that NTP synchronization was not functioning. We tried to restart services and found they would restart and then shutdown after a few minutes. The only way to fix this was to correct the local time on the ESXI host (which was about 70 minutes behind true time). We also noticed high NTP jitter and fluctuating low and high ping RTTs.

Can you help answer the following questions...

1) What is the expected application behavior for a large 16 node cluster when NTP becomes unreliable - I.e. NTP clock sync experiences high jitter? How does it impact patch upgrades and how does it impact client authentications?

2) What is the expected behavior when there is time discrepancy between the local clock defined on the ESXI host and the local clock configured on the ISE (if NTP is unreliable)? Is there a reliance between the local ESXI host clock and the local ISE-VM clock, and which is master?

3) What issues would you expect if a customer enables NTP client on ESXI host that the ISE-VM is installed on? I believe we recommend against enabling this, but what bad things would happen?

4) Can you supply a link to explain the different fields in the "show ntp" CLI output?

Thanks

Chris

Jason Kunst · ‎06-14-2016

ISE would get the base time from the host VM, this should be stable and accurate to start. Think of having an appliance and relying on the underlying hardware to get your base time. This needs to be stable first before looking external.

Then you will need to point to a stable trusted time source in your organization for best support.

Here are the recommended resources.

Cisco Identity Services Engine Administrator Guide, Release 2.0 - Administer Cisco ISE [Cisco Identity Services Engine]…
http://www.cisco.com/c/en/us/td/docs/security/ise/2-0/cli_ref_guide/b_ise_CLIReferenceGuide_20/Cisco_ISE_CLI_Commands_in_Configuration_Mode.html#wp6428581100
Cisco Identity Services Engine CLI Reference Guide, Release 2.0 - Cisco ISE CLI Commands in Configuration Mode [Cisco I…

View solution in original post

Jason Kunst · ‎06-14-2016

ISE would get the base time from the host VM, this should be stable and accurate to start. Think of having an appliance and relying on the underlying hardware to get your base time. This needs to be stable first before looking external.

Then you will need to point to a stable trusted time source in your organization for best support.

Here are the recommended resources.

Cisco Identity Services Engine Administrator Guide, Release 2.0 - Administer Cisco ISE [Cisco Identity Services Engine]…
http://www.cisco.com/c/en/us/td/docs/security/ise/2-0/cli_ref_guide/b_ise_CLIReferenceGuide_20/Cisco_ISE_CLI_Commands_in_Configuration_Mode.html#wp6428581100
Cisco Identity Services Engine CLI Reference Guide, Release 2.0 - Cisco ISE CLI Commands in Configuration Mode [Cisco I…

Christopher Hobbs · ‎06-14-2016

Thanks Jason - but I'd like to understand how ISE behaves when NTP is unreliable to see if it matches some of the issues we experienced.

During testing, we observed NTP would switch between sync and unsync and would like to know what values trigger that.

below are cli outputs of "show ntp" when it is in using local and sync'd time. The big concern are the offset and jitter values, and we need to understand what is causing that, especially when the delay values are relatively low (4.3-4.5ms).

#sh ntp

synchronised to local net at stratum 11

time correct to within 73 ms

polling server every 64 s

remote refid st t when poll reach delay offset jitter

==============================================================================

*127.127.1.0 .LOCL. 10 l 6 64 177 0.000 0.000 0.000

<NTP1> <NTP1> 6 u 53 64 177 4.589 30622.7 17270.7

<NTP2> <NTP2> 4 u 42 64 177 4.303 46425.1 27940.3

#sh ntp

synchronised to NTP server (<NTP 2>) at stratum 11

time correct to within 209 ms

polling server every 64 s

remote refid st t when poll reach delay offset jitter

==============================================================================

127.127.1.0 .LOCL. 10 l 43 64 377 0.000 0.000 0.000

<NTP1> <NTP1> 6 u 2 64 377 4.382 240.283 36.614

*<NTP2> <NTP2> 4 u 9 64 377 4.534 270.101 30.531

Chris

Jason Kunst · ‎06-14-2016

It would be best to gather the logs and work with the TAC to better understand what is happening and if it is still happening to gather some network traces and debugs as well

Christopher Hobbs · ‎06-14-2016

okay, will do.