Re: The elephant in the room....Network Monitoring

Steven Williams · ‎09-05-2017

Ok let's face it, not all organizations can hire people to look at alerts all day or have a fully staffed NOC. So I am on a search of conversation among other engineers on "what we monitor and why" and also the challenges of monitoring. Rather it be Cisco Prime, Solarwinds, or whatever other single of million products are out there, they all serve the same purpose.

I find myself walking into an organization where after 3 days I have created email filters to send all alerts to my deleted folder because its just TOO much. Alert when the interface is down, alert when it comes up, alert when the 90 percent threshold has been met and another alert when its below the threshold...Bounce in any of the legacy T1 lines, alert! Any spike in bandwidth on a circuit, alert! I almost feel we set alerts for alerts.

Is it really worth monitoring thresholds on a T1 when all your phone and Internet traffic is pushed through it? I mean ok, the alert goes off and what can you really do about it?!?! Not like I can increase the T1 speed at the touch of a key on my PC.

I almost want to just can everything and start over. Make a checklist that says what is it? Do we care? What do we monitor? Do I really want to know the up/down status on access ports for a Cisco switch? OH PLEASE NO! I get about 1500 emails a day. I mean when is too much monitoring ineffective?

So like I said this isn't a question or a please solve this, but just want to get ideas from people who have walked into this before or still struggle with it. How did/do you approach it? Share some ideas and or insights. Not one size fits all.

Leo Laohoo · ‎09-05-2017

We have no choice but to enable monitoring. Monitoring not just helps me do my job but also gets management off my back.
We have several "hierarchy" in our monitoring: The core and distro devices sends out emails and SMS/IM while simple access switches only send out emails.
We also monitor UPS, proxy servers and other stuff.
Everyone in the networks team get these emails and IM. I get a ton daily and I just delete the ones that have recovered. If there are any alarms that are "funky" then I go in and look at them. Like the alert was caused by high CPU, for example, I go in and check it out.

marce1000 · ‎09-05-2017

- I use NAGIOS for monitoring network infrastructure, perimeter, servers and storage infrastructure.

>can hire people to look at alerts all day

Monitoring systems as good as they are, or whatever product is used , mostly fail on the 'above' sentence; because that is the core effort for any monitoring system to keep it 'alive'. That doesn't mean that people need to be hired , but in the end someone must have a dedicated operator-role.

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Steven Williams · ‎09-06-2017

I agree completely. I have worked at places where helpdesk monitors these things and if something looks not right to them they would drop a ticket. But that requires training of help desk individuals. It just gets to the point where too much monitoring is not efficient. If I get 1500 emails a day, the chances that something important/critical alert is going to get lost in there somewhere.

marce1000 · ‎09-06-2017

- That leads to a very elaborated discussion and leads to many items involved; the simplest answer is : only monitor critical things then. Here already the relation get's defined between the maintainer of the monitoring solutions and the operators. Monitoring levels don't stand on there own neither, and are related to environments where they operate in. A bank or a hospital surgery-room may require monitoring of ALL events, while a student college facility may only be interested in critical alerts, ... not writing the rest of the book now ... :-)

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '