09-05-2017 05:03 PM - edited 03-01-2019 06:06 PM
Ok let's face it, not all organizations can hire people to look at alerts all day or have a fully staffed NOC. So I am on a search of conversation among other engineers on "what we monitor and why" and also the challenges of monitoring. Rather it be Cisco Prime, Solarwinds, or whatever other single of million products are out there, they all serve the same purpose.
I find myself walking into an organization where after 3 days I have created email filters to send all alerts to my deleted folder because its just TOO much. Alert when the interface is down, alert when it comes up, alert when the 90 percent threshold has been met and another alert when its below the threshold...Bounce in any of the legacy T1 lines, alert! Any spike in bandwidth on a circuit, alert! I almost feel we set alerts for alerts.
Is it really worth monitoring thresholds on a T1 when all your phone and Internet traffic is pushed through it? I mean ok, the alert goes off and what can you really do about it?!?! Not like I can increase the T1 speed at the touch of a key on my PC.
I almost want to just can everything and start over. Make a checklist that says what is it? Do we care? What do we monitor? Do I really want to know the up/down status on access ports for a Cisco switch? OH PLEASE NO! I get about 1500 emails a day. I mean when is too much monitoring ineffective?
So like I said this isn't a question or a please solve this, but just want to get ideas from people who have walked into this before or still struggle with it. How did/do you approach it? Share some ideas and or insights. Not one size fits all.
09-05-2017 05:25 PM
09-05-2017 10:09 PM
- I use NAGIOS for monitoring network infrastructure, perimeter, servers and storage infrastructure.
>can hire people to look at alerts all day
Monitoring systems as good as they are, or whatever product is used , mostly fail on the 'above' sentence; because that is the core effort for any monitoring system to keep it 'alive'. That doesn't mean that people need to be hired , but in the end someone must have a dedicated operator-role.
M.
09-06-2017 05:20 AM
I agree completely. I have worked at places where helpdesk monitors these things and if something looks not right to them they would drop a ticket. But that requires training of help desk individuals. It just gets to the point where too much monitoring is not efficient. If I get 1500 emails a day, the chances that something important/critical alert is going to get lost in there somewhere.
09-06-2017 05:28 AM
- That leads to a very elaborated discussion and leads to many items involved; the simplest answer is : only monitor critical things then. Here already the relation get's defined between the maintainer of the monitoring solutions and the operators. Monitoring levels don't stand on there own neither, and are related to environments where they operate in. A bank or a hospital surgery-room may require monitoring of ALL events, while a student college facility may only be interested in critical alerts, ... not writing the rest of the book now ... :-)
M.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide