Understanding icmp polling interval in NMS

David_Mitchell · ‎10-24-2013

Hi all,

I am basing my question on the logic that a device can be polled successfully at 09:54:00, go off-line at 09:54:32 and return on-line again at 09:54:98 in time for the next NMS poll and this would show no down time at all.

What I am seeing is the opposite, in that I think the device is being polled and is failing to respond, then it is coming back on-line x seconds later and is then being successfully polled and the NMS is reporting a down time of 49 seconds (the difference between when it physically came back on-line and when it was next polled). Does the SNMP Agent have some internal mechanism to retain the up/down time duration so that it can be reported back to the NMS when it is back on-line and successfully being polled, or is the value of 49 seconds down-time duration being generated by the NMS system as to how long ago it was since it was successfully able to poll the device. If it is the latter why would this be 49 seconds if the attempts to poll interval is every 60 seconds?

Problem Specifics

My network monitoring is set up to poll every 60 seconds and has an icmp threshold set of 2 seconds.

As I understand it, this means that every 60 seconds a device will be sent a ping and it has up to 2 seconds with which to respond otherwise it is classed as down.

Looking at the reports I can see that a router did not respond and shows a down duration of 49 seconds, before reporting up time of 1 minute in the next entry. What I am trying to work out is how it got to this 49 second duration value?

If the NMS was set to ping the device every 60 seconds and the first ping occurred at 09:54:00 and succeeded, the next poll would be at 09:55:00. Lets say that the 09:55:00 icmp poll was unsuccessful. This is then reported back to the NMS as down. It would then try and poll again at 09:56 to see if the device is back on-line and in my case it was, only the total downtime shows as 49 seconds

Should this not always be 1 minute or multiples thereof?

Could it be that the NMS system once detecting that a device is down, will then step up it’s polling to that down device with repeat polls for a specified period until it gives up and reverts back to it’s regular schedule. I.e. it pings every second and on the 49^th second it got a response?

Thanks in advance

David