cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
14203
Views
32
Helpful
14
Replies

SRV record found.Not all SRV records have IP, will need to run additional query for get IP.

MNBob
Level 1
Level 1

First of all, this post has nothing to do with https://community.cisco.com/t5/identity-services-engine-ise/srv-record-found-not-all-srv-records-have-ip-will-need-to-run/td-p/3927645.  I do not have too many domain controllers or SRV records and the response is not getting truncated.  I have brought my case to TAC, and they cannot seem to get the above mentioned truncation issue out of their head - so I am looking for some assistance here.

 

I have scheduled my ISE servers to run Active Directory diagnostics every night to ensure that the connection to Active Directory is healthy.  Tests executed during this diagnostic routine are as follows:

 

DNS A record high level API query  

DNS A record low level API query  

DNS SRV record query  

DNS SRV record size  

Kerberos check SASL connectivity to AD  

Kerberos test bind and query to ROOT DSE  

Kerberos test obtaining join point TGT  

LDAP test - DC locator  

LDAP test - GC locator  

LDAP test AD site association  

LDAP test DCs availability  

LDAP test DCs response time  

System health - check AD service  

System health - check DNS configuration  

System health - check NTP  

 

All tests come back as successful – except the “DNS SRV record query” test:

 

Warning: SRV record found. Not all SRV records have IP, will need to run additional query for get IP.

 

The reason for this warning is:

 

Our DNS Servers have the “Minimal-Responses” option set to “True” (see https://tools.ietf.org/html/rfc8482).  This limits the response to the ISE query so it does not include the IP addresses of the AD domain controllers.  A quick Google search of DNS and “Minimal Responses” shows that “True” is the default setting in Infoblox and Bind – and it seems to be a standard practice to maintain that setting.

 

So what is the big deal:

 

The rather innocuous warning message above in 1 out of 15 diagnostic tests generates a system-wide “Warning” level alert to be generated – stating:

 

Active directory diagnostic tool found issues - One or more Active Directory diagnostic tests failed during a scheduled run.

 

So now – we have a situation where an expected response during a schedule diagnostic test – generates a “failed” alert to our operations staff of a significance that would typically generate an incident report (and possible page-out).

 

The only mitigation steps I can think of both seem to be unacceptable:

 

  1. I could reduce the severity of the “Active Directory Diagnostics tests failed” alert; but then I will not be alerted if one of the other 14 diagnostic tests fail.
  2. I could just “not run” a scheduled AD diagnostic test – but then I lose the ability to ensure that the ISE AD connection is healthy.

 

Is there anything else I can do here?  Can I somehow remove this test from the schedule, change the test, etc.?

 

 

 

 

 

1 Accepted Solution

Accepted Solutions

Thanks for the reply - and the link.  It was nice to get a refresher on the AD connector - though it really has nothing to do with the issue here.

 

The issue actually is as follows:

  • A single step of an ISE 15 steps diagnostic routine is assuming that a valid and expected response form a DNS query is an error condition - and it is generating a false warning alert - not just for that subset - but for the ENTIRE diagnostic routine.

  • The false warning significantly reduces the value of the diagnostic routine, in that it requires human analysis to decide upon whether the alert means that an error condition actually exists or if the false positive is the only issue being reported.

Also note:  I am not trying to use this forum as a replacement for TAC.  I was guessing that this issue was somewhat prevalent - given that we are using DNS default settings - so I was wondering if anybody else might know a strategy for suppressing 1/15th of a diagnostic routine that I could not find.

 

I will take your advice and try again with TAC - though I am certain they will point to the truncation issue again and I will try to fight my way through it.

 

View solution in original post

14 Replies 14

Jason Kunst
Cisco Employee
Cisco Employee

If TAC doesn't have idea then please escalate it. This is not a replacement for TAC. Will see if any ideas can be forwarded around. Did you look at this?

 

What's new in ISE Active Directory connector - BRKSEC-2132
Chris Murray, Technical Leader, Cisco
Cisco Identity Services Engine (ISE) integrates with Active Directory using a new connector. We will introduce new features, concepts and troubleshooting tools as well as Best Practices to help you avoid and resolve issues. This session is a pre-requisite to any ISE deployment when you have been deploying multiple Active Directory in your Company.

Thanks for the reply - and the link.  It was nice to get a refresher on the AD connector - though it really has nothing to do with the issue here.

 

The issue actually is as follows:

  • A single step of an ISE 15 steps diagnostic routine is assuming that a valid and expected response form a DNS query is an error condition - and it is generating a false warning alert - not just for that subset - but for the ENTIRE diagnostic routine.

  • The false warning significantly reduces the value of the diagnostic routine, in that it requires human analysis to decide upon whether the alert means that an error condition actually exists or if the false positive is the only issue being reported.

Also note:  I am not trying to use this forum as a replacement for TAC.  I was guessing that this issue was somewhat prevalent - given that we are using DNS default settings - so I was wondering if anybody else might know a strategy for suppressing 1/15th of a diagnostic routine that I could not find.

 

I will take your advice and try again with TAC - though I am certain they will point to the truncation issue again and I will try to fight my way through it.

 

Hi @MNBob 

 

Interesting indeed. I learned something about DNS through your post. I hope the TAC can assist you. 

The AD integration in ISE is generally very well thought out and coded. But needs some updating. Eg. If I join 1 node to AD then I must join ALL remaining nodes to AD to avoid the Alarm “ISE not joined to AD”. You cannot suppress that without turning off all AD integration Alarming (bad idea). A simple switch to tune the AD alarms would be nice. I recall putting in a feature request ages ago. 

right and best to work with tac to get cases logged and http://cs.co/ise-feedback into the PMs

I have a different take on this "error".  It's not an error. per se, but it is an inefficiency perhaps.  I have had this issue in multiple versions of ISE, and I've essentially ignored it for at least 4 years.  Look at what the error says:  It says it could not find an IP, and that a second query would need to be run to find it. A forward DNS query looks up a name, in this case, the name of a particular SRV record, and it expects to get an IP address as the response.  That's just the way DNS works.  If you look up a name, and you  get another name, you then need to look up that second name to get an IP.

Assuming that SRV records exist in your DNS for each of your AD Domain controllers, you need to look at the actual data in the records.  For example,  if you have a SRV record for _ldap._tcp.dc._msdcs.foobar.org, it will have entries for Priority, Weight, Port, and Target, the Target being one of the Domain Controllers in your domain.  You'll have one of these records for each DC.  The AD guys in my group put the FQDN of the Domain Controller in the Target field.  I'm suggesting that you enter in the IP address of the Domain Controller in the field for each of the _ldap._tcp.dc._msdcs and possibly the _kerberos._tcp.dc._msdcs SRV records.

If you use DNS to preform a lookup and it responds with a name, it then has to perform a lookup on the response to get the IP.  That's two lookups, and that's inefficient.  I mentioned this once to our AD guys, and they looked at me like I had two heads.  I believe it's a quirk of Microsoft.  It wouldn't be the first time Microsoft didn't follow best practices.

So, in brief, you have a bunch of SRV records that exist to allow lookups to find one of many Domain Controllers.  When using AD as an External Identity Source, ISE expects to find a _ldap._tcp.dc._msdcs.<domain> record and a _kerberos._tcp.dc._msdcs.<domain> for each DC.  If these records exist, and the target is the FQDN of each DC, I'm suggesting you replace the FQDN with the IP address of the DC.  I'd love to hear alternative takes on this.  

joseponceiii
Level 1
Level 1

@MNBob Hi, just wondering if you found any solution to this? Though this doesn't affect any of the services currently, it does however continuously generate this same alert warning message that you are experiencing (1/15 tests warning) and our AD diagnostic tool is also scheduled to run every day. Hope you can share your solution if you've found any. 

 

Running ISE 2.7 patch 2.

 

Thanks,

Unfortunately no solution yet.  Just hoping Cisco puts it into a future release.

Hi There,

We are now at version 3.1 Patch 5 and are still getting this alert.

I have a few a few days to spare so I'll log a case with TAC and post the results to here.

 

drichards21
Level 1
Level 1

Any answers yet?  I am on ISE 3.0 p7 and have had this issue as long as I can remember going back to at least 2.4 if not the original ISE 1.0.  It would be nice to get a fix for this.

I had hope this would go away on version 3.x and above.  Also this blog has been around for a bit and they go over the issue.

https://www.lookingpoint.com/blog/cisco-ise-ad-diagnostic-srv-record-query-alert

Arne Bier
VIP
VIP

@drichards21 that's a brilliant article that explains the issue really well. The original developer who worked on the ISE AD section was a top bloke - I think there is a Cisco Live session somewhere in the archives - and I always wonder if he still works at Cisco. The trick is to get the attention of Cisco to address this issue. Feature Request, TAC case or both.

HQuest
Level 1
Level 1

As helpful as @drichards21 linked article is, it does not truly explain the fact you will still see this warning while bound to a forest of a single AD server and a 130 bytes long message size:

# nslookup -type=srv _ldap._tcp.dc._msdcs.domain.com 192.168.9.200
Server: 192.168.9.200
Address: 192.168.9.200#53

_ldap._tcp.dc._msdcs.domain.com service = 0 100 389 w2k19.domain.com.

# dig @192.168.9.200 -t srv _ldap._tcp.dc._msdcs.domain.com +edns

; <<>> DiG 9.19.17 <<>> @192.168.9.200 -t srv _ldap._tcp.dc._msdcs.domain.com +edns
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50388
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 2156a6b47d716f7b01000000652d4e6a73c60d81bb5f7e1c (good)
;; QUESTION SECTION:
;_ldap._tcp.dc._msdcs.domain.com. IN SRV

;; ANSWER SECTION:
_ldap._tcp.dc._msdcs.domain.com. 600 IN SRV 0 100 389 w2k19.domain.com.

;; Query time: 0 msec
;; SERVER: 192.168.9.200#53(192.168.9.200) (UDP)
;; WHEN: Mon Oct 16 10:53:30 EDT 2023
;; MSG SIZE rcvd: 130


#_

There must be more on this topic than just the EDNS or UDP packet size, and if TAC gave this as a suggestion, I would challenge it.

Arne Bier
VIP
VIP

I have to admit that in my lab, ISE 3.2p3 and two Windows Server 2016 Standard domain controllers (patched to the eyeballs) I don't get this AD Healthcheck issue. But I do see it with my customers who have a much larger number of domain controllers.

Hi Arne! 

I have the same deployment on production, and I am seeing the issue—just FYI. I will open a TAC and keep you all posted

acazarez
Level 1
Level 1

Hello All,

This is what I got from Cisco TAC

=====================================================================================

"Hello, Antonio,

I have checked internally and identified that the issue you’re experiencing is known in version 3.x, especially when the domain controllers (DC) count is five or more.

The alarm is just a warning, and it indicates that part of the SRV records does not contain the server's IP address; it contains only FQDN. Hence, ISE will need to run additional queries (for 'A' records) to get the DCs' IP addresses.

There is a cosmetic bug and filed as enhancement, not yet fixed. https://bst.cisco.com/bugsearch/bug/CSCwb93856 ENH: ISE 3.X: Improve logic for "DNS SRV record query" AD alarm.

You may choose to disregard the warning for now. Additionally, you can subscribe to the enhancement linked above to receive updates.

===================================================================

I am planning to upgrade from 3.2 P3 later this year. I just migrated all my deployments from ESXi to AHV, so I am kind of tired of chasing this ghost now.

I hope this works for someone.