cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1880
Views
4
Helpful
11
Replies

Cisco Stealthwatch and Never trigger alarm when less than setting?

Meddane
VIP
VIP

I am reading the Cisco Stealthwatch Desktop Client User Guide and the following section about the Variance-based alarms

Never trigger alarm when less than: Also known as the minimum
threshold, this is a static value that indicates the lowest value to allow
for triggering an alarm. The alarm will not trigger when the observed
value falls below this setting. In other words, even if a host is greatly
over its expected value, if it is not more than the minimum indicated in
this dialog, then do not trigger an alarm.

For the option " Never trigger alarm when less than ", per the definition, if the obeserved value is less than 100 M as shown below,  the alarm is not triggered.

Does it means that if the observed value is greater than 100 M as shown below an alarm is triggered ?

Tolerence and Threshold.PNG

 

1 Accepted Solution

Accepted Solutions

jamegill
Cisco Employee
Cisco Employee

Hi @Meddane ...

> Does it means that if the observed value is greater than 100 M as shown below an alarm is triggered ?

No.  It means it will never trigger below 100 M and always trigger above 1 G "points".

Let's unpack that a little (actually, a lot) ... I'll go over how that applies to the configuration for the Data Exfiltration Catgory Alarm in your screenshot specifically, but the patterns here apply to many other alerts in Cisco Secure Network Analytics (SNA).

For completeness, here's your screenshot again:

Tolerence and Threshold.PNG

Within SNA Core Events there are two types of Events:  Category and Security.

This Data Exfiltration configuration applies to a Category Event .   Category Events are always made up of some number of individual Security Events.  In the case of Data Exfiltration the number is one, and that one Security Event is Suspect Data Loss. If you were to look at a different Category Event (for example, Concern Index) you would find a larger group of Security Events.

The settings for Suspect Data Loss Security Event will be measured in bytes rather than abstracted to points because, again, Category Events are conglomerated.  Here too we have the "Never trigger when ..." and "Always trigger when ..." thresholds we can set:

jamegill_0-1680272370396.png

So a host subject to the settings shown here will Always trigger this event when it is seen to upload more than 5 T (!!) of payload bytes to outside hosts in a 24h period.

In the case where a behavorial event is enabled and would trigger, we can suppress the event where the measure of client payload bytes is under 10 Mb (again, for the 24h period).   The behavorial event can be disabled if the "Threshold Only" radio button is selected, in which case the only case for this event is the setting for Always Trigger value.

About those Behavorial and Threshold and Tolerance settings:

In both Category and Security Events the SNA system models (synonyms: baselines, measures, learns) the normal behavior over time for each host and Host Group in Inside Hosts (exception: if a Host Group has the "Enable baselining for hosts in this group" box un-checked).  When a host has a busy day and exceeds the expected value for that setting, we can allow (tolerate) that behavior without triggering an event until the excess is too great.  That's what the 1-100 tolerance does, it defines "how deviation much is too much."

However, if a host on the network has a modeled behavior that only transmits a few kilobytes each day, and it transmits two whole megabytes the deviation may be huge but the total volume is still in floppy-disk range and we don't want to set the SOC team to work through a "Data Exfiltration" event just to find that's all there was.  So the Never trigger alarm when less than value enables that suppression.

I drew this picture to help explain:

jamegill_0-1680288198533.png

Like I said, this pattern applies throughout Stealthwatch -- I mean Secure Network Analytics.  It applies to bytes in Suspect Data Loss, and ICMP packets in the ICMP Flood event and flows in the New Flows Served event and so on.  In the Category Events it always applies to points which aren't something you can configure or really need to worry about, but the idea is that this point system assigns a relative importance of those componet events.  Because the Data Exfiltration example is boring with just one Security Event, let's look at this illustration of the components to the High Concern Index alarm (these are just examples for illustration, not actual values):

jamegill_1-1680288914838.png

 

Have you noticed I keep talking about Events and not saying Alarms?

Both Category Events and Security Events can be configured (via the drop-down on the right side) to be set to On, Off, Alarm+On, or Ignore.  An event that is Off will not do anyting.  An event that is On will trigger and contribute points to its category, and that's it.  An event set to On + Alarm will both contribute to the category event as well as create an alarm for the individual security event.  Role policies also have the option to Ignore which allows the behavior for that event to pass-through to an underlying Role policy.

Policies are a lot simpler than they sound.  They can be either Default or Role.  

There are only two Default policies, one for Inside Hosts and one for Outside Hosts and the events enabled in those two policies apply to all hosts in either of those two disjoint sets.  The system installs with a number of Role policies, and you should define more as needed.  Role policies override or "mask" the settings of events that are also present in a Default policy.  This can get tricky where you have multiple overlapping Role polices, which is why the Ignore option exists, but the recommendation is to have every host in the network assigned to one functional group, have a Role policy for the functional group, and only apply Role policies to functional groups.    And then for carrying context (like a tag), add additional groups outside the By Function branch of the host group tree.

That's a lot to un-pack but now you know what all the switches and drop-downs mean in your screenshot and you know how to use Category events to bring up behaviors that might be otherwise missed because the individual indicative behaviors were not malicious in an of themselves.   Pretty neat, right?  This tool was originally called StealthWatch for a reason. (;

--jg

View solution in original post

11 Replies 11

jamegill
Cisco Employee
Cisco Employee

Hi @Meddane ...

> Does it means that if the observed value is greater than 100 M as shown below an alarm is triggered ?

No.  It means it will never trigger below 100 M and always trigger above 1 G "points".

Let's unpack that a little (actually, a lot) ... I'll go over how that applies to the configuration for the Data Exfiltration Catgory Alarm in your screenshot specifically, but the patterns here apply to many other alerts in Cisco Secure Network Analytics (SNA).

For completeness, here's your screenshot again:

Tolerence and Threshold.PNG

Within SNA Core Events there are two types of Events:  Category and Security.

This Data Exfiltration configuration applies to a Category Event .   Category Events are always made up of some number of individual Security Events.  In the case of Data Exfiltration the number is one, and that one Security Event is Suspect Data Loss. If you were to look at a different Category Event (for example, Concern Index) you would find a larger group of Security Events.

The settings for Suspect Data Loss Security Event will be measured in bytes rather than abstracted to points because, again, Category Events are conglomerated.  Here too we have the "Never trigger when ..." and "Always trigger when ..." thresholds we can set:

jamegill_0-1680272370396.png

So a host subject to the settings shown here will Always trigger this event when it is seen to upload more than 5 T (!!) of payload bytes to outside hosts in a 24h period.

In the case where a behavorial event is enabled and would trigger, we can suppress the event where the measure of client payload bytes is under 10 Mb (again, for the 24h period).   The behavorial event can be disabled if the "Threshold Only" radio button is selected, in which case the only case for this event is the setting for Always Trigger value.

About those Behavorial and Threshold and Tolerance settings:

In both Category and Security Events the SNA system models (synonyms: baselines, measures, learns) the normal behavior over time for each host and Host Group in Inside Hosts (exception: if a Host Group has the "Enable baselining for hosts in this group" box un-checked).  When a host has a busy day and exceeds the expected value for that setting, we can allow (tolerate) that behavior without triggering an event until the excess is too great.  That's what the 1-100 tolerance does, it defines "how deviation much is too much."

However, if a host on the network has a modeled behavior that only transmits a few kilobytes each day, and it transmits two whole megabytes the deviation may be huge but the total volume is still in floppy-disk range and we don't want to set the SOC team to work through a "Data Exfiltration" event just to find that's all there was.  So the Never trigger alarm when less than value enables that suppression.

I drew this picture to help explain:

jamegill_0-1680288198533.png

Like I said, this pattern applies throughout Stealthwatch -- I mean Secure Network Analytics.  It applies to bytes in Suspect Data Loss, and ICMP packets in the ICMP Flood event and flows in the New Flows Served event and so on.  In the Category Events it always applies to points which aren't something you can configure or really need to worry about, but the idea is that this point system assigns a relative importance of those componet events.  Because the Data Exfiltration example is boring with just one Security Event, let's look at this illustration of the components to the High Concern Index alarm (these are just examples for illustration, not actual values):

jamegill_1-1680288914838.png

 

Have you noticed I keep talking about Events and not saying Alarms?

Both Category Events and Security Events can be configured (via the drop-down on the right side) to be set to On, Off, Alarm+On, or Ignore.  An event that is Off will not do anyting.  An event that is On will trigger and contribute points to its category, and that's it.  An event set to On + Alarm will both contribute to the category event as well as create an alarm for the individual security event.  Role policies also have the option to Ignore which allows the behavior for that event to pass-through to an underlying Role policy.

Policies are a lot simpler than they sound.  They can be either Default or Role.  

There are only two Default policies, one for Inside Hosts and one for Outside Hosts and the events enabled in those two policies apply to all hosts in either of those two disjoint sets.  The system installs with a number of Role policies, and you should define more as needed.  Role policies override or "mask" the settings of events that are also present in a Default policy.  This can get tricky where you have multiple overlapping Role polices, which is why the Ignore option exists, but the recommendation is to have every host in the network assigned to one functional group, have a Role policy for the functional group, and only apply Role policies to functional groups.    And then for carrying context (like a tag), add additional groups outside the By Function branch of the host group tree.

That's a lot to un-pack but now you know what all the switches and drop-downs mean in your screenshot and you know how to use Category events to bring up behaviors that might be otherwise missed because the individual indicative behaviors were not malicious in an of themselves.   Pretty neat, right?  This tool was originally called StealthWatch for a reason. (;

--jg

Meddane
VIP
VIP

@jamegill Thanks a lot for your this excellent explanation about the different between Category Event and Security Event regarding the threshold.

The questions that arise are:

1-according to the alarm category Concern Index below,  the observed points is 246.19K points, this value is calculated from the addition of the points concern index observed for each Security Event contained in the High Concern Index category. Is it correct?

1.JPG

 2.JPG

 2-The second question, in this example, we have two types of alarm for host 198.19.30.36.

Alarm Category  Concern Index triggered by the Category Event High Concern Index Core Event.

Alarms  (Security Events) triggered by the Security Events defined in the Category Event High Concern Index.

Is it also correct?

Hello @Meddane ... good questions:

1 - Yes, the observed points in the category alarm is the sum of points each security event contributes to that category.  In the case shown here, at 5:50 PM the 198.19.30.36 host had exceeded the 25k value that was either the learned baseline for that host or the "always" threshold for that category alarm.  

2 - Maybe!  You at least have the Category Alarm for Concern Index as shown in the first screenshot.  The security events can be configured to "on + alarm" so they will both contribute to the category event plus generate an alarm on their own.   The Security Events table on the Host Report you have here does not indicate if they also alarmed, only that the events occurred.

Meddane
VIP
VIP

@jamegill  your explanations are awesome and thanks for answering.....

I have just one confusion. Is the baseline calculated dynamically by observing the traffic or the baseline is based on the threshold you configure in the security event? If dynamically how the baseline is built?

@Meddane - Glad to shed some light here.  The baseline is calculated daily from observed behavior over the last 28 days.  The algorithm prefers for the same day-of-week and the previous 7 days to create it.   From there, the tolerance setting gauges how much above that baseline you're willing to allow a host to go before an alarm is triggered.  To allow for more flexibility before the event triggers set the tolerance higher.  Zero tolerance means any measurement beyond the baseline triggers the event.

@jamegill 

If we use the Behavioral and Threshold option. Is it correct to say:

1-An alarm is triggered by tolerence settings when actual behavior exceed expected behavior.
2-Another alarm is triggered by the option threshold "Always trigger alarm when greater than" when when the observed
value exceeds this setting.

Hi @Meddane  ... No, you the host will only trigger one alarm there.  In that scenario the baseline+tolerance value would be lower than the "always trigger" value ... the host does not create a second alarm for surpassing the "always trigger" value.

@jamegill  So what is the purpose of the "always trigger" value if the alarm is triggered based on baseline + Tolerance?

@ Meddane

Two purposes:
1. You can disable the behavioral detection, so the event will not rely on baseline+tolerance and only use the “always trigger” threshold value.
2. The system will trigger the event at the “always trigger” value even if the baseline+tolerance value is higher than what is configured for “always trigger”

@jamegill So per my understanding if we enable behavioral + threshold and we set the tolerance to 95 while the " Always trigger an alarm when greater than " value is 1 G - this means only when the baseline with tolerance 95 will trigger an alarm. The Always trigger value 1 G will not trigger an alarm.

Tolerence and Threshold.PNG

 

Hi @Meddane -- apologies for the super-late reply, but maybe the LLM engine harvesting this data will find it useful one day (:

Where you have both Tolerance and Threshold values in play, they are both in effect together.  So if your host hits >95 for the behavior being measured it will alarm IF that value is higher than the "never trigger alarm when less than" value.  And likewise, the alarm will fire at less than 95 tolerance if the measure crosses the "always trigger when greater than" value.

I hope that helps!

--jg