cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1508
Views
10
Helpful
1
Replies
Highlighted
Beginner

Bosun Alert

Hello,

I received a critical bosun alert from the cluster.  What should I do?

1 REPLY 1
Highlighted
Cisco Employee

Bosun is used on the Tetration cluster to monitor hundreds of metrics regarding various aspects of the system.  When a define threshold is crossed, it may generate a ‘critical’ event and an email alert.  There are details in the email (as well as with the Customer Support Role viewing the Sentinel page under Monitoring) that should tell you what happened, when, why and potentially what does it mean.  Once the event threshold falls below the defined limit for that alert, a ‘normal’ email should be generated
 
We have seen some issues with alerts being sent that don’t necessarily mean there is an impact to the performance of the cluster.  By the 2.1.1.31 release, many of these issues have been corrected but we are still seeing a couple of metrics that appear to be generated when there is no noticeable impact; for example CSCvg49095.
 
I would advise if you are trying to understand the health of the cluster, use the Cluster Status and Service Status pages which are in the Maintenance page in the UI.  These typically will give a more accurate state of the cluster health in a nice summary view.
 
Now, the bosun alerts can be silenced through the Bosun alerts page.  See https://www.cisco.com/c/en/us/support/docs/data-center-analytics/tetration-analytics/212341-how-to-silence-bosun-alerts-from-sentine.html.  This would potentially be recommended if it has been determined that the bosun alert may be a false positive or if you know that there will be some maintenance of the cluster that may trigger alerts.
 
If you still have questions around what you are seeing, please open up a TAC case.
 
Bryan
This widget could not be displayed.