02-21-2011 09:43 AM
We get this alarm quiet frequently from one of our domain controllers, but we have had no reported issues other than this error. Can you shed some light on this error and what may be causing this to occur several times a day?
Monitoring Event: MON.DNS has occurred.
Event Details:
Event Id | 12528177 | |
Event Type | MON.DNS | |
Event Time/Date | 2011-02-21 07:17:06 | |
Event Message | CRITICAL on DNS on host Thunderbolt Network Manager Appliance (192.168.70.174) at 2011-02-21 7:17:04 -0600 - DNS 192.168.70.10 has no records | |
Customer Name | ||
Customer ID | 1287.ciscovar.com | |
Device ID | F0:AD:4E:00:0B:D9 | |
Site ID | 1 | |
Delay |
Yogi Yeager
02-21-2011 11:42 AM
Hi Yogi,
This event will happen any time the DNS client on Thunderbolt performs a name lookup (www.cisco.com by default) against the DNS servers used by the local LAN (either served via DHCP, or statically set on the TBA admin screen), and the nameserver does not return the record. You can change which target is resolved in the Portal by editing the TBA device itself and selecting the Monitors (tab) > DNS Service.
It's not surprising that this is the only indication you've received about the problem; this is what Thunderbolt is designed to do, detect errors before they become problems for your customers. Likely any user on the network at the time the event occured would also have issues resolving against that particular DNS server, although most applications would just try an alternate nameserver (if available) and so the user might not ever know that the failure had occured. When TBA detects a service failure (such as with DNS here), it waits 1 minute, and checks the service again. So the service has to be non-responsive twice in 60 seconds for the TBA to decide that this is a true event and report it up.
In this particular case, is the server perhaps down or overloaded (maybe performing a backup?) at the times the events are occuring? This one happened at 7:17AM CST. If these events are being generated pretty regularly, it might be worth investigating the cause. If you decide that this issue is just noise and not really worth resolving, and it's not causing issues for the users at that site, you can also pause this monitor on the TBA so that you don't receive further notifcations about it. Personally I'd take a look, it could be slowing down browsers with extra DNS lookups at times.
The Thunderbolt Deployment Guide says this about the DNS.MON event:
Monitors the amount of time in seconds for DNS
hostname resolution. If hostname resolution fails,
a Critical event is generated.
Specify the target DNS hostname for monitoring
DNS hostname resolution, set latency thresholds
in seconds for hostname resolution, and set the
severity level to use for Recovery, Warning, and
Critical Events.
03-07-2011 04:47 AM
With all due respect to Cisco's folks, we've seen and just learned to ignore this error for almost a year. It's never once been correct. Just because www.cisco.com doesn't resolve doesn't mean that the DNS server is down or slow. IMHO, you should change what it's looking up to an internal name and see if that helps, but even when we’ve done that we’ve had TBAs come back saying it can’t resolve names. I had internal DNS logging set to max on several DNS servers and never found any errors other than what TBA sent us. If every customers’ DNS servers, we’ve seen this error from pretty much every TBA installed, were having this problem we’d be hearing complaints about browsing daily from every customer. I’m convinced the problem is elsewhere.
03-07-2011 08:26 AM
With all due respect to Cisco's folks, we've seen and just learned to ignore this error for almost a year. It's never once been correct. Just because www.cisco.com doesn't resolve doesn't mean that the DNS server is down or slow. IMHO, you should change what it's looking up to an internal name and see if that helps, but even when we’ve done that we’ve had TBAs come back saying it can’t resolve names. I had internal DNS logging set to max on several DNS servers and never found any errors other than what TBA sent us. If every customers’ DNS servers, we’ve seen this error from pretty much every TBA installed, were having this problem we’d be hearing complaints about browsing daily from every customer. I’m convinced the problem is elsewhere.
Hi Brian (and Yogi),
Transient DNS problems can be hard to pin down. Brian, you didn't say whether the DNS server's logs showed the queries from the TBA around the time that the event was reported - do you see ANY successful queries in the server logs at around the time of a failure event?
DNS servers tend to cache queries until the record's TTL expires, possibly the failure is happening only when the local DNS server is forced to expire the local cache and re-resolve. Based on the timestamp Yogi reported, (7:17am), I was curious if the events seemed to occur at customer locations only during off-hours when typical office users wouldn't be around to notice (or report) the issue, maybe during a period when the DNS server is under load for some other reason, possibly a backup. Looking into Yogi's customer's events, I see large periods of time (> 1 hour) where DNS service is unresponsive, this really looks a valid issue at the site and deserves some attention.
You CAN change the target of the DNS resolution by editing the TBA device in the portal and going to the Monitors tab and selecting the DNS Service monitor. You could try putting in a record that the local DNS server is authoritative for, to rule out upstream resolution problems. If you still continue receive failure to resolve events, you could try putting in localhost as the resolution target, this will cause the TBA to perform the resolution itself. If you still continue to recieve events for failing to resolve localhost, this would certainly be a bug with the TBA that would need to be addressed. Otherwise, the TBA is just a generic DNS client, and is reporting that the server it's asking isn't giving it the record it's asking for.
A DOS batch or unix shell script could be written pretty easily to perform a DNS query of the same target every 30 seconds and run from another machine on the network, it could then be used to confirm or rule out that other DNS clients on the network are seeing the same issue. It would just need to print 'date', perform the nslookup, wait 30 seconds, then loop, with the output hopefully redirected to a file. You could then go back in it's log and compare what was happening at the time the TBA saw and reported an event to see if it saw the same thing.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide