cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2915
Views
40
Helpful
3
Replies

ISE very high authentication latency caused by DNS resolution?

Johannes Luther
Level 4
Level 4

Hi ISE professionals,
we experienced very high ISE 3.1 P1 authentication latencies (10+ seconds) at busy times during the day. However the ISE is far away from the scaling limits regarding concurrent sessions etc. This situation was so severe, that for example WLCs assumed that the ISE is dead (RADIUS timeout) and switched to the second ISE.

This is caused by a "Remote Logging Target" (Syslog UDP), where we send our logs to (Splunk). The remote logging target was configured using the FQDN. When configuring the remote logging target using IPv4, the problem was immedately resolved.

Anybody experienced similar issues and can explain it? When thinking about this, the following questions are pop up in my head:

a) Is DNS really the only issue here? Obviously, DNS queries are blocking RADIUS packets in some kind of interal queue in the ISE. By default the ISE does not perform DNS caching, so every Syslog message results in one DNS query (including DNS RTT).
Does it make sense to perform on a multi threading system these tasks (DNS, Logging) in the same thread as RADIUS authentication (which is crucial for this box!). A syslog message must never influence something like RADIUS from a processing point of view.

b) Did I miss something regarding ISE best practices like: "Never configure the remote syslog target as FQDN, because it will reduce your overall scalability"?

 

Best regards

Johannes

1 Accepted Solution

Accepted Solutions

Arne Bier
VIP
VIP

Hi Johannes,

Behind the scenes ISE makes a lot of DNS resolution queries - by default ISE does not cache DNS responses - if you wireshark an ISE Admin node for 60 minutes with a UDP/53 filter applied, you'll be surprised. They come in bursts and I have seen multiple repetitive queries in very short succession.  The answer is of course to enable the DNC Caching that was introduced in 2.7 (I think).  It allows ISE to calm down a bit and honour the DNS TTL. For DNS responses that don't have a TTL, ISE can apply a fixed value TTL for you. I have now applied this to all of my customers and no complaints. The command below is entered in conf t on all ISE nodes to reduce the DNS query spam:

service cache enable hosts ttl 3600

As for using IP addresses instead of DNS - I am fundamentally against that concept because I believe that DNS is the answer.  If your SYSLOG servers are likely to never change their IP address, then perhaps the argument to use DNS is weak. If the IP address is cached in ISE and a TTL of 3600 applied, then at most you will 24 queries per day. That's not too bad in my opinion.

View solution in original post

3 Replies 3

Arne Bier
VIP
VIP

Hi Johannes,

Behind the scenes ISE makes a lot of DNS resolution queries - by default ISE does not cache DNS responses - if you wireshark an ISE Admin node for 60 minutes with a UDP/53 filter applied, you'll be surprised. They come in bursts and I have seen multiple repetitive queries in very short succession.  The answer is of course to enable the DNC Caching that was introduced in 2.7 (I think).  It allows ISE to calm down a bit and honour the DNS TTL. For DNS responses that don't have a TTL, ISE can apply a fixed value TTL for you. I have now applied this to all of my customers and no complaints. The command below is entered in conf t on all ISE nodes to reduce the DNS query spam:

service cache enable hosts ttl 3600

As for using IP addresses instead of DNS - I am fundamentally against that concept because I believe that DNS is the answer.  If your SYSLOG servers are likely to never change their IP address, then perhaps the argument to use DNS is weak. If the IP address is cached in ISE and a TTL of 3600 applied, then at most you will 24 queries per day. That's not too bad in my opinion.

@Arne Bier is right.

I even made DNS caching an example for ISE CLI with Ansible for Configuration : Enable DNS caching

 

Johannes Luther
Level 4
Level 4

Hi all,

thank you for your replies. Just out of curiousity - what TTL value do you configure for DNS queries, which are not resolved or where there is no TTL in the response?

@Arne Bier : I see it the same way (" I believe that DNS is the answer"). Funny thing is, that when you configure a FQDN in the remote logging target config a warning message pops up (I don't know with which version Cisco started to show this message - I'm sure in ISE 2.4 this warning was not there):

JohannesLuther_0-1666882132640.png

No doubt, that without DNS caching, this is totally inefficient and makes less sense BUT what I don't expect is, that DNS resolution of syslog targets will fill out the same internal queues as the RADIUS process. This is something I don't understand. In most switching platforms, there are dedicated queues for at least network control and best effort stuff... Because RADIUS (and TACACS+) are crucial applications for the ISE I would expect, that these have dedicated threads.

So we may argue, that DNS is important as well for authentication purposes (especially of there is an AD backend auth or an OSCP lookup), but that a logging lookup will break the whole thing is hard to understand from my point of view

Hopefully if either caching is deployed or IP addresses for remote logging is used, the syslogs are not processed in the same queue / thread as the RADIUS traffic

@thomas : Yes I know since your ISE CLI with Ansible document, that there is DNS caching I read it in the WebEx ISE Team. Very cool document!