cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6197
Views
1
Helpful
0
Comments
migcerva
Cisco Employee
Cisco Employee

    Author / Co-authorMiguel Cervantes & Jose Manuel Huerta (@josemhue)

    General Overview 

    In this post we are exploring the world of ThousandEyes alerts, focusing on the challenge of managing “noisy” alerts for cloud and enterprise agents. 

    For users who have already set up their alert rules but feel overwhelmed by the constant notifications, this document offers practical tips, best practices, and straightforward strategies to fine-tune alert settings for smoother monitoring and increased productivity. 

    Background 

    There are two distinct types of conditions in an alert rule: global alert conditions and location alert conditions. 

    A global alert condition represents a triggering event, signifying that all conditions specified in the alert have been met, thereby activating the alert. 

    Conversely, a location alert condition denotes a qualifying event, indicating that only some conditions have been met, yet they still qualify as part of the global event. 

    Below is the depiction of where both global and location alerts are situated: 

    migcerva_0-1714686306061.png

    With that in mind, let’s explore the common triggers and recommended techniques for managing "noisy" notifications:

     

    Global Alert Conditions Parameters: 

    Let's analyze each key parameter as demonstrated below: 

    migcerva_1-1714686350849.png

    1) All Vs. Any: 

    migcerva_2-1714686406459.png

    When multiple conditions are assigned to an alert, the advisable strategy is to select “All conditions....” This will evaluate all agents assigned to the test configuration against the alert rule and will make sure all the location alert conditions are met. 

    2) “Any of” Vs. “the same” condition:

    migcerva_3-1714686438387.png

    Consider adjusting the setting from "any of" to "the same". This modification activates "sticky" agent settings, potentially reducing alert noise.  

    Keeping the setting on "any of" could result in more frequent notification alerts. 

    3) % of agents Vs. numerical value (agent):

    migcerva_4-1714686467636.png

     

    Rather than relying solely on a specific numerical value when evaluating logical parameters and considering changes in the number of agents assigned to a test, we recommend employing a threshold based on “% of agents”. 

    Using a fixed numerical value may require frequent adjustments, particularly if there are fluctuations in the assignment of agents to the test.  

    Conversely, opting for a percentage-based threshold provides increased flexibility and adaptability, automatically adjusting to variations in the agent count without necessitating manual interventions. 

     

    4) “<> out of <> times in a row”:  

    This is mostly for alerts assigned to tests running at lower intervals (e.g. 1, 2, 5 mins). 

    The suggested course of action is to consider at least3 out of 3 times in a row to make sure we wait multiple rounds before the alert gets triggered.  

    For tests running at higher interval times, we recommend the following: 

    Screenshot 2024-05-02 at 2.50.11 PM.png

     

    Location Alert Conditions Parameters:

    Note: Please be aware that the example images provided in this section are illustrative configurations only, meant to demonstrate best practice recommendations. 

    The global and local alert conditions may be different and built based on individual customer requirements. For a tailored alerting strategy suited to your specific use case, we recommend consulting the ThousandEyes documentation and/or reaching out to your ThousandEyes representative.

     

    Using just the “Dynamic” baseline: 

    Dynamic baseline alerts utilizing standard deviation can generate significant noise for metrics with a small or highly stable average. 

    When using dynamic baseline, make sure you complement it with other specific condition(s), like the example shown below (adding packet loss & latency location conditions):  

     

    migcerva_5-1714686715129.png

    Using Default Alert rules with no tuning: 

    Most default alerts adhere to "standard" parameters outlined in this documentation, but each customer's requirements are unique. To be more granular on the default alert rules, it is highly advisable to duplicate these rules and customize each location alert condition accordingly. 

    Additionally, it is crucial to grasp the current Service Level Agreements (SLAs) and key performance indicators (KPIs) specific to each use case, tailored to individual needs: 

    1. The test type to which the alert will be assigned. 
    • Metric which is causing false positive (FP) alerts.

    Tip: If no specific baseline has been established previously, It is suggested to analyze the mean value of the metric of interest for the local alert condition in the past few days.

    Conversely, if multiple tests are to be included in the same alert rule, an additional common condition shared among grouped tests can be incorporated. For instance: 

    Latency > 100 ms (Original and unique condition)  

    To make the alert less “sensitive,” we can add a new condition which must be met together with the previous location condition. An example of adding a 2nd condition is shown below: 

     

    migcerva_6-1714686761463.png

     

    Additional suggestions:

    Perform regular alert “cleanup” tasks; Avoid alerts duplicates: 

    Ensure to check for alerts with duplicated matching conditions and thresholds. This review helps identify and rectify unnecessary recurrences, enhancing the efficiency of the alerting process. 

    Use Alert Suppression: 

    ThousandEyes Alert Suppression Window lets users mute alerts for specified conditions or events during a set period, useful for avoiding unnecessary alerts during maintenance or when you neither require nor desire to receive alerts. 

    Assign proper receipt(s): 

    Assigning alerts to designated recipients (email, webhook, integration, etc) ensures that only relevant stakeholders receive notifications, thus preventing unnecessary interruptions for individuals not directly involved in resolving the issue. 

    Segment alerts per region/agents depending on test assigned and SLAs/KPIs.

    When considering adding an alert to a test that has assigned multiple agents situated across separated locations or regions, we advise configuring separate alerts, each tailored to the specific conditions of the respective agent's location. This approach is necessary as the SLA/KPI may differ based on the vantage points.

    For instance, if the test involves agents located in different continents, it may be necessary to segment and create location-specific conditions for each scenario. Subsequently, these specific alerts can be applied to the test accordingly. E.g:

     

    migcerva_7-1714686806354.png

     

    In this situation, we have a test featuring agents in both North America and Asia, and distinct SLAs/KPIs tailored to our requirements. Below are some alert examples of how the test configuration above can be configured by region:

    migcerva_8-1714686827279.png

     

    migcerva_9-1714686839560.png

    Reference:   

     

     

    Getting Started

    Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: